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Editor's Introduction 




Jane C. Blake 

Editor 



The design of semiconductor chips has been the 
topic of several past Digital Technical Journal 
issues. With the introduction of Alpha 21064, the 
world's fastest microprocessor, this issue focuses 
for the first time on the development of semi- 
conductor technologies that make possible the high- 
performance of Digital's VLSI chips. Engineers in 
Advanced Semiconductor Development present 
in-depth views into CMOS-4 technologies, which pro- 
duce microprocessors with up to 1,7 million transis- 
tors and operating frequencies as high as 200 MHz. 

The signiiicant increase in performance achieved 
with each generation of CMOS technology is in part 
the result of a synergistic relationship between 
microprocessor design and process engineers. In 
their paper on process technology contributions 
to microprocessor performance, Bjorn Zetterlund, 
Jim Farrell, and Frank Fox describe the scaling the- 
ory that has led to a doubling of gate density and 
an increase of 30 percent in gate speed in four suc- 
cessive CMOS generations. They discuss process 
features implemented in CMOS-4, and close with a dis- 
cussion of models that predict process variations. 

Models and tools, essential in providing design- 
ers early insight into the characteristics of the tran- 
sistors to be fabricated, are the focus of a paper by 
Marden Seavey, John Faricelli, Nadim Khalil, Gerd 
Nanz, Llanda Richardson, Christian Schiebl, Hamid 
Soleimani, and Martin Thurner. The authors describe 
several physical models that accurately simulate 
transistor behavior, and present numerical mathe- 
matical methods used to enhance existing simula- 
tors. An overview of Digital's and others' efforts to 
integrate simulation tools concludes the paper. 

The need for both high-density logic gates and 
on-chip cache memory in microprocessors pre- 
sents special challenges to process engineers. 
Andre Nasr, Greg Grula, Antonio Berti, and Rich 



Jones review the front-end process (formation of 
device and local interconnect) for the CMOS-4 
0.75-^m technology and the steps taken to meet 
design requirements. They also describe the effects 
on submicron devices related to the scaling of fea- 
ture sizes and examine some solutions. 

Goals for the back-end process (formation of 
global metal interconnect) were also driven by the 
logic design requirements for higher circuit density 
In addition, back-end development goals included 
the continued use of equipment developed for 
the 1.0-/xm CMOS-3 technology. Marion Garver, Joe 
Bulger, Tom Clark, Jamshed Dubash, Lorain Ross, 
and Dan Welch relate how tools were modified for 
CMOS-4 and describe new blanket tungsten and pla- 
narization processes for submicron devices. 

To produce a specified yield of CMOS devices, 
defect reduction and yield enhancements, like other 
processes, must be initiated concurrently with 
the design stage. Mary Beth Nasr and Ellen Mager 
review the principles of microcontamination con- 
trol and outline defect reduction techniques to 
increase product yield in the areas of p -gate leakage 
and metal 2 short circuits. The paper that follows 
addresses the methodology of yield enhancement, 
including processing, process equipment, manu- 
facturing, and design and test. Randy Collica, Joe 
Dietrich, Rudy Lambracht, and Dave Lau describe 
the use of test chip data, yield models, and the 
selected approach to yield analysis and forecasting. 

An advanced method that helps designers predict 
circuit hot carrier 1 ifetime and thus maximize tran- 
sistor performance at the required reliability level 
is the topic of a paper by Dan Jackson, David Bell, 
Brian Doyle, Bruce Fishbein, and David Krakauer. 
The authors describe a physically based method 
for determining the acceptability of hot carrier- 
induced degradation in transistor characteristics. 

Another predictor of chip lifetime is the reliabil- 
ity of the interconnects. In their paper on electro- 
migration reliability, Joe Clement, Eugenia Atakov, 
and Jim Lloyd provide a helpful overview of the 
potential erosion in metal interconnect due to 
electron conduction. They then present a scaling 
model developed to characterize the reliability of 
CMOS-4 chip interconnects. 

The editors thank Rich Hollingsworth and Arlene 
Delvy of the Advanced Semiconductor Develop- 
ment Group for their guidance and unfailing sup- 
port in developing this issue. 
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degrees in electrical engineering from Rensselaer Polytechnic Institute. Bjorn is 
a member of IEEE, Tau Beta Pi, and Eta Kappa Nu. 



Foreword 




R.J. Hollings worth 

Manager, Atli tfi i ced 
Semiconductor Development 



Digital has developed and manufactures the worlds 
fastest production complex instruction set (CISC) 
and reduced instruction set (RISC) microproces- 
sors. The speed of these microprocessors is due, 
in large part, to a complementary metal oxide semi- 
conductor (CMOS) technology having faster tran- 
sistors, as dense on-chip wiring, and innovative 
performance-enhancing materials and structures. 
This advanced CMOS technology allows Digital the 
unique capability to design and produce micro- 
processors that operate twice as fast as common 
leading-edge devices produced by the world's pre- 
mier semiconductor manufacturers. 

In 1980, Digital recognized the strategic role that 
microprocessor chips played as core elements in 
reshaping and advancing the computer industry. A 
key observation was that the unrelenting advances 
in very large-scale integrated (VLSI) circuits would 
continue to allow vast amounts of logic and mem- 
ory to be economically produced in a single sili- 
con chip, thereby yielding dramatic improvements 
in performance, cost, and reliability. VLSI devices 
were demonstrating yearly improvements of 10 to 
15 percent in gate switching speed and 25 to 35 per- 
cent in density. Microcontamination and process 
control methods coupled with increasing wafer 
size allowed larger chips to be fabricated at lower 
cost. It was clear that the ability to integrate more 
and more function into a piece of silicon was funda- 
mentally changing the computer industry. The era 
of entire computing systems on a single chip was 
rapidly approaching. Chips were not just compo- 
nents in a system, they were becoming the system. 
To fully exploit this and lead Digital into what 
C. Gordon Bell termed a "semi-computer company 
in the 1990s, the decision was made to develop 



and subsequently manufacture semiconductor 
technologies. 

Digitals semiconductor operations group set a 
goal in the early 1980s to achieve leadership in the 
development and manufacture of the world's high- 
est performance microprocessors. To meet this, a 
number of strategic positions were taken: 

■ Develop distinct generations of CMOS that would 
produce a wide range of VLSI devices, not only 
microprocessors. 

■ Be at the leading edge in density; be ahead of the 
industry in high-speed switching devices and 
system -level features. 

■ Make CMOS technology decisions by optimizing a 
wide range of requirements necessary to meet 
the goal: the world s fastest microprocessors. The 
approach would be a rigorous engineering opti- 
mization from computer architecture through 
chip manufacturing processes. 

■ Develop CMOS with a single, multi-disciplined 
technical team dedicated to the project from 
initial conception through manufacturing qualifi- 
cation — a four- to five-year endeavor. 

■ Use the microprocessor product, targeted to 
world- leading performance, as the specific focal 
point for CMOS development Tie together the 
efforts of full-time chip architecture, design, test, 
reliability, packaging, and manufacturing people 
for the full project duration, i.e., four to five 
years. 

■ Develop CMOS technology in conjunction with 
the microprocessor architecture and design — 
an essential ingredient in delivering leading VLSI 
chips to the market first. This "concurrent" 
approach has been a mainstay in Digital's CMOS 
development since the early 1980s. 

■ Participate in, contribute to, and draw* upon the 
best semiconductor research in the world. 

Many leading semiconductor companies follow 
these strategies. Digital, however, is unique in prac- 
ticing all of them. 

To help guide the technical direction, a CMOS 
technology road map was created in the early 1980s. 
It defined the key pacing elements that delineate 
each distinct CMOS generation: minimum feature 
size, switching speed, manufacturable chip and 
wafer size, and other attributes necessary to deliver 
leading-edge microprocessors. This roadmap set 
the goals for the organization and allowed easy 
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reference to competitive trends. In its simplest 
form, the roadmap denned each CMOS generation 
by making logic gates switch 30 percent faster 
while occupying half the silicon area and by inte- 
grating these elements on chips that were growing 
40 percent in area compared with the previous gen- 
eration. The roadmap today defines eight genera- 
tions of CMOS; four have been introduced to 
manufacturing (CMOS-1 through CMOS-4), and two 
arc now under development (CMOS-5 and CMOS-6). 

The papers in this issue of the Digital Technical 
Journal are focused on Digital's fourth-generation 
CMOS (CMOS-4) that is used to build a wide variety 
of VLSI chips, most notably the NVAX and Alpha 
21064 microprocessors. CMOS-4, presently being 
manufactured in Digital's semiconductor facilities 
in Hudson, MA, and South Queensferry, Scotland, 
delivers microprocessors with up to 1.7 million 



transistors operating at clock rates up to 200 MHz. 
A variety of materials and structures have been 
crafted into CMOS-4, allowing advanced system- 
level capabilities not available in other state-of-the- 
art CMOS processes. 

This issue spans the breadth of technical areas 
necessary to make CMOS-4 a successful element 
in establishing Digital's preeminent position in 
microprocessor technology. The discussions herein 
include how manufacturing process and chip 
design trade-offs are made; how a large number 
of complex manufacturing steps are integrated; 
how leading-edge speed, density, and materials are 
achieved; and what modeling, simulation, and mea- 
surements are critical to ensure reliability and pro- 
duceability The papers are a sample of the range 
of technological achievements in Digital's semi- 
conductor operations. 
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Microprocessor Performance 
and Process Complexity in 
CMOS Technologies 

Digital's CMOS technology) is characterized by a scaling methodology that doubles 
the gate density and improves the gate speed by approximately 30 percent with 
each new generation. Decreasing feature size from one generation of CMOS tech- 
nology to the next is fundamental to improving the performance of VLSI chips. Each 
of Digital's successive CMOS generations has added new technology features to 
improve performance further. Digital's latest, qualified CMOS technology incorpo- 
rates features such as low voltage operation, low-resistance topside substrate con- 
tacts, low-resistance transistor gate material, heal interconnects in SRAMs, three 
levels of metal interconnect, and fuses for redundancy. 



The goal of Digital s semiconductor organization is 
to provide leadership in product performance and 
functionality, as most recently evidenced by the 
Alpha 21064 and WAX microprocessors. 12 Internal 
development of complementary metal-oxide semi- 
conductor (CMOS) processes has been crucial to the 
success of these chips because it allowed us to 
design the process to meet very specific needs. The 
identification and fulfillment of these needs has been 
a multigenerational, ongoing task that closely links 
the chip design effort with the process development. 

Each new generation of CMOS technology is scaled 
to double the gate density and improve the gate 
speed by 30 percent. In addition to the generation- 
to-generation improvements that are derived from 
scaling, a CMOS process that is designed specifically 
for high-performance microprocessor applications 
requires a number of features beyond those nor- 
mally implemented in CMOS processes. As a new 
process is being developed, proposed new features 
are critically evaluated in order to arrive at the opti- 
mum trade-off between chip performance and pro- 
cess complexity. 

This paper describes Digital's CMOS processes 
from the perspective of those process features that 
contribute to the performance and functionality 
of high-speed microprocessors. It begins with a 
short discussion on microprocessor architecture, 
which strongly influences the direction of process 
development. 



CMOS Microprocessors — 
General Considerations 

Several factors determine the performance of the 
fastest microprocessor that can be built in a given 
CMOS technology. The performance of a micro- 
processor is inversely proportional to the product 
of clock cycles per instruction (C?\y and the 
machine cycle time. From one generation of micro- 
processors to the next, improvements in both CPI 
and machine cycle time are required in order to 
meet the performance goal. CPI depends on the 
microarchitecture as well as the mix of instruc- 
tions executed. The minimum machine cycle time 
depends only on circuit and microarchitectural 
issues. 

Improvements in CPI are achieved in part by 
pipelining and in part by adding cache memory to 
the die. As shown in Figure 1, pipelining is defined 
as the simultaneous execution of two or more 
instructions; for example, the first part of a two- 
part instruction is executed at the same time as the 
second part of the previous instruction. Simultane- 
ous execution reduces the number of CPI; the result 
is higher performance at the expense of more 
circuitry. 4 Adding on-chip cache memory also 
reduces the CPI. The microprocessor can quickly 
access instructions or data resident in its caches, 
rather than wait for this information to be trans- 
mitted from on-board random-access memory 
(RAM) chips. 
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Figure 1 Relationship between CPI and 
Pipelining 

As the minimum feature size decreases, transistor 
density increases. Thus, for a given die size, a 
0.75-micrometer Om) minimum feature size tech- 
nology (CMOS-4) can support four times as many 
transistors as a 1.5 -/xm technology (CMOS-2). In addi- 
tion, advances in process technology lead to 
increases in the size of the largest die that can be 
built with an acceptable manufacturing yield. 

Microprocessor designers take advantage of the 
extra transistors to increase the degree of pipe- 
lining and the size of caches. This reduces the CPI, 
which boosts the performance of the machine. 
Table 1 shows the difference between two genera- 
tions of scaled CMOS processes. The halving of the 
feature size has been augmented by a larger die size 
and microarchitectural changes that increase the 
SPl'Cmark 5 performance by a factor of 4.8. 



Generation-to-Generation Scaling 

The pace of improvement in microprocessor per- 
formance indicated by the REX520/NVAX compari- 
son is ahead of the industry. 6 Microprocessor 
performance has been doubling approximately 
every two years, and there is no evidence that this 
pace of change is slackening. The shrinking of 
feature sizes with each new generation of CMOS 
process technology and the increase in yieldable 
die sizes have enabled this rapid improvement in 
the performance of very large-scale integration 
(VLSI) chips. 

At the center of the reduction in the feature sizes 
of CMOS processes is the miniaturization of the MOS 
transistor. Over the past 15 years, a set of rules, 
known as scaling theory, has been developed to 
guide this process. 78 9 

In the fundamental form of scaling, called con- 
stant field scaling, the transistor's physical parame- 
ters and the power supply voltage are kept 
proportional to the feature sizes to maintain the 
magnitude and the contours of the electric fields 
within the transistor. All the dimensions of the tran- 
sistor, e.g., length, width, gate dielectric thickness, 
and source/drain junction depths, and the power 
supply and threshold voltages are reduced by the 
scaling factor, [1/k (where k is greater than 1)], 
while the doping concentrations are increased by 
k. These rules are also extended, with some excep- 
tions, to guide the miniaturization of the intercon- 
nect. In practice, scaling theory is not followed 
exactly, for reasons of both performance and stan- 
dardization that are discussed below. Digital's 
implementation of scaling through four CMOS gen- 
erations is shown in Table 2. 

Improved Performance through Scaling 
The reduced feature sizes made possible by scal- 
ing have a major impact on node capacitance and, 
hence, the speed of the chip. The minimum cycle 
time of a microprocessor is inversely proportional 
to the capacitances of the gates, sources, drains, 
and interconnect. The gate capacitance is inversely 
proportional to the thickness of the gate dielectric, 
and transistors with thinner gate dielectrics have 
higher drive current. Since the minimum cycle time 
is a stronger function of transistor drive current 
than gate dielectric capacitance, the trade-off 
should be made in favor of a thinner gate dielectric. 

In Digital s family of CMOS processes, gate capaci- 
tances have scaled with minimum feature size. 
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Table 1 Comparison of Single Chip VAX Microprocessors 

Cycle 

Minimum Tape Time Chip 

Feature Out Performance Cycles per (Nano- Size No. of 

Process Size Chip Date SPECmarks* Instruction* seconds) (Mils) Transistors 

CMOS-2 1.5 jum REX520 Sep 87 8.5 11.95 28 460x460 320,000 

CMOS-4 0.75 jum NVAX Nov 90 40.5 5.85 12 636X574 1,300,000 

Notes: 

'These are combined integer and floating point SPECmarks run on a VAX 6000 Model 410 (REX520) and a VAX 6000 Model 610 (NVAX). 

1CPI depends not only on the CPU chip, but also on the memory subsystem and the particular program being executed. The CPI values 
quoted here are a composite for the ten benchmark programs in the SPECmarks suite. 



Table 2 Comparison of Feature Sizes in CMOS Generations 
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1 


0.75 
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Metal 1 contact/polysilicon 
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0.75 
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From the 1.5 -Mm minimum feature size of CMOS-2 
technology to the 0.75 -^ni size of CMOS-4, the area 
of the gates was scaled by a factor of four and the 
gate dielectric thickness was halved. The result is a 
twofold reduction in gate capacitance (C = e 0 /l/r). 
The typical gate dielectric thickness in the 0.75/Am 
CMOS-4 process is 105 angstroms (A). Manufactura- 
bility and reliability considerations have been the 
major factors determining the minimum gate 
dielectric thickness used for each generation. 

As shown in Figure 2, the sources and drains of 
n-channel metal-oxide semiconductor (NMOS) tran- 
sistors form N+/P diodes to the substrate. Since 
the p-type doped substrate is held at a potential of 
V (ground) and (during normal operation) the 
sources and drains are always at V $s or higher, these 
diodes are always reverse biased and act as voltage- 
dependent capacitors. The sources and drains of 
p-channel metal-oxide semiconductor (PMOS) tran- 



sistors form P+/N diodes to the n-wells; the n-well 
is held at (power supply voltage). 

The capacitance of a reverse-biased diode is a 
function of its shape and size: there is both an 
area component and a perimeter component. 

c wi*\ = x Area + C p C n*mm x Length.of JPerimeter 

Since the area scales with the square of the mini- 
mum feature size and the perimeter scales directly 
with the feature size, C lolx[ scales by somewhat 
more than the minimum feature size: the exact 
amount depends on the shape of the source or 
drain. For the NVAX microprocessor, which was 
designed in CMOS-4, the area and perimeter compo- 
nents contribute about equally to C totsll . In future 
technology generations, the perimeter component 
will tend to dominate. 

In Digital's CMOS processes, metal interconnect 
widths and spaces are scaled with the minimum fea- 
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ture size, but metal thicknesses and dielectric thick- 
nesses are held constant to avoid three undesirable 





N» NMDD PFI « ld 



Figure 2 Diagram of NMOS and PMOS 
Transistors Showing Gate, 
Source, and Drain for CMOS- 4 



effects. Scaling the metal thickness would increase 
the sheet resistance (leading to larger power supply 
voltage drops and RC time constant delays) and 
decrease the current-carrying capabil ity of the con- 
ductor lines (from an electromigration viewpoint). 
Scaling the interconnect dielectric thickness would 
increase the capacitance per unit area. Because the 
thicknesses of conductor lines and dielectric layers 
are not scaled, the aspect ratios of the spaces 
between the conductors and the contacts or vias 
between interconnect layers increase. This makes 
fabrication more difficult. 

As with sources and drains, interconnect capaci- 
tance has both an area component, which scales 
quadratically, and a perimeter component, which 
scales linearly. Consequently, the total capacitance 
of interconnect scales by somewhat more than the 
minimum feature size. Because neither the inter- 
connect thickness nor the dielectric thickness is 
scaled, the capacitance between adjacent conduc- 
tor lines increases. The result is an increased suscep- 
tibility to cross-talk between adjacent bus signals, 
which is shown in Figure 3. In the NVAX micro- 
processor, greater-than-minimum spaces were used 
on some critical buses to reduce cross-talk. 
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Note: C t 3 and C 14 are the total capacitances lor the center line for CMOS-3 and CMOS-4. respectively. 
Dimensions are typical. 



Figure 3 Cross Section of Three Minimum-spaced Metal 1 Lines Drawn to Scale for CMOS-3 and CMOS-4 
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The use of industry-standard power supply volt- 
ages results in a significant violation of constant 
Held scaling rules. Nevertheless, power supply volt- 
age is generally held constant across two or more 
process generations [5.0 volts (V) in CMOS-1 and 
CMOS-2 and 3.3 V in CMOS-3 and CMOS-4] in order 
to maintain voltage compatibility with industry- 
standard chips such as RAMs. However, a nonscaled 
power supply voltage presents formidable chal- 
lenges for the design of reliable transistors. 

Developing CMOS for Microprocessors 

The particular implementation of transistors, inter- 
connect, and special circuit elements in a CMOS 
process depends on the application. For Digital's 
high-speed microprocessors, performance, as mea- 
sured in SPECmarks, is crucial. s In addition to opti- 
mizing the transistors for maximum drive current, 
performance in this application can be improved 
by adding process features to provide denser 
on-chip cache static RAM (SRAM), interconnect 
with high current capability, and precision resistors 
for impedance matching. As discussed above, per- 
formance can also be improved by increasing the 
die size. By contrast, a major part of the effort in 
designing a process for dynamic RAMs (DRAM) is 
directed toward developing a very small, high- 
capacitance memory element. 

In addition to performance, the planned pro- 
duction volume is an important factor in defin- 



ing a CMOS process. A process for low- to moderate- 
volume, high-performance microprocessors differs 
from a process optimized for fast turnaround gate 
arrays or high-volume RAMs. In a high-volume prod- 
uct, a great deal of effort is devoted to reducing the 
total number of process steps. For example, com- 
pensating blanket implants are often used to set the 
thresholds of the transistors to decrease the num- 
ber of photolithographic masking steps. This 
approach couples the parameters for the NMOS and 
PMOS transistors, making parameter adjustments 
more difficult and requiring more development 
effort. 

To produce high-performance microprocessors, 
Digital has developed many unique features for its 
CMOS technologies. Table 3 lists the new technol- 
ogy features that have been developed for each pro- 
cess generation to meet increasingly demanding 
performance requirements. We begin our discus- 
sion of the implementation of these features and 
the requirements for reliable circuit operation by 
addressing the issue of power dissipation in high- 
speed microprocessors. 

Power Supply Voltage 

It is well known that CMOS power dissipation is 
dominated by CXV M 2 X/\ where C is the switched 
capacitance, V da is the power supply voltage, and/ 
is the clock frequency. A reduction in V d(l is an excel- 
lent way to counteract the increase in power due to 



Table 3 Features Added by Generation for CMOS-1 to CMOS-4 
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the higher frequencies and larger switched capaci- 
tance (which results from the increase in die size). 
When developing the CMOS-3 process, we chose to 
reduce the power supply voltage from the industry 
norm of 5 V to the Joint Electronic Device Engi- 
neering Council (Jl'-DEC) 3.3 V standard. 1 " 11 

With the CMOS-4 process specified for 3-3-V 
power supply voltage, the NVAX and the AJpha 
21064 microprocessors consume 16 watts (W) at 
100 megahertz (MHz) and 27 W at 200 MHz, respec- 
tively. If the supply voltage were 5.0 V, the power 
would scale by (5 V) 2 /(3-3 V) 2 = 2.3. This increase in 
power dissipation would have greatly increased the 
complexity and cost of the chip packages. 

Significant changes to the NMOS and PMOS transis- 
tors were necessary to optimize the process for 
operation at 3.3 V. The most visible parameter 
change was a lowering of the target threshold volt- 
ages for the NMOS and PMOS transistors by about 
|0.4 1 V to 0.5 V and -0.5 V, respectively. To explain 
why this is necessary we must consider the depen- 
dence of both the nodal transition time (which is a 
good measure of circuit performance) and the tran- 
sistor currents on V (fd . The time required to transi- 
tion a signal node between the power supply rails is 
proportional to the charge (O) on the node and 
inversely proportional to the drain-to-source cur- 
rent (/ r/v ) of the driving transistor. Since Q a V (Ul and, 
to first order, l ds a V (M J , the time required to transi- 
tion a node is inversely proportional to V /f( . How- 
ever, when second-order effects are considered, the 
3.3-V technology is of about the same performance 
as the corresponding 5-V technology. The second- 
order effects include the benefits from lowering the 
threshold voltages of the transistors in the 3-3-V pro- 
cess and the compromises that would have to be 
made to the transistors in the 5-V process to make 
them reliable. 



Hot Carrier Degradation 
A CMOS transistor that is subjected to excessive volt- 
ages becomes damaged over time by hot carriers. 
Hot carriers are highly energetic current carriers 
that result from the high electric fields in the tran- 
sistor. To date, the NMOS transistor has been more 
susceptible to hot carrier degradation than the 
PMOS transistor. Hot carrier damage gradually 
reduces the saturation current J DS]T of the NMOS 
transistor as the damage increases over time. On 
chips with a nominal 3-3-V power supply, some 
transistors are subjected to source-drain voltage 
transients as high as 4.3 V. Table 4 gives details of 
the origins of these high voltage transients. 

Hot carrier rules for the CMOS-4 process are illus- 
trated in Figure 4, which shows the three legal 
regions of device operation on a plot of (gate-to- 
source voltage) versus V f/s (drain-to-source voltage). 
Devices may operate in any, or all , of three regions: 
(1) unconditionally safe region, (2) region subject 
to turn-on transient rule, and (3) extended safe 
region for "off" devices. Devices can spend up to 



UNCONDITIONALLY 
SAFE REGION 



4.5 V 



0.3 V - 
0.0 




REGION SUBJECT 
TO TURN-ON 
TRANSIENT RULE 



EXTENDED 
SAFE REGION 
FOR OFF DEVICES 



3.6 V 



4.3 V 



4.5 V 



Figure 4 CMOS-4 Hot Carrier Rules 



Table 4 Origins of High Voltage Transients 



V ds Subtotal 

(Volts) (Volts) Comments 



Power supply 


3.30 


3.30 


Nominal voltage 


Power supply tolerance 


0.165 


3.465 


5% tolerance, includes ripple 


On-chip power supply 
ringing due to package 
inductance 


0.175 


3.64 


From NVAX SPICE simulations. Half of peak-to-peak 
noise (\/ dd _internal with respect to t£ s _internal) 


Booting above V dd due 
to capacitive coupling 


0.66 


4.30 


Capacitive coupling to susceptible nodes is limited to 
<20% by designers 


Total 


4.30 
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100 percent of the time in the safe region, no more 
than 5 percent in the region subject to turn-on tran- 
sient rule, and no more than 10 percent in the "off" 
devices safety region. 

A wide variety of NVAX circuits were simulated 
to determine what constraints should be placed 
upon circuit design style in order to ensure that 
the CMOS-4 hot carrier rules were not violated. A 
set of general circuit design constraints was devel- 
oped, and a computer-aided design (CAD) tool 
was written to ensure that all the circuits on the 
NVAX chip observed these constraints. The hot 
carrier CAD checks were run prior to fabrication, 
and circuits that violated the constraints were 
redesigned. 

Electromigralion Considerations 
If the average current density ( / avcr:igc ) through 
an aluminum conductor line is too high, the con- 
ductor line is susceptible to metal migration. This 
phenomenon occurs over time as the electron 
current forms voids at one site and deposits 
downstream. Eventually a short circuit or an open 
circuit develops, which results in a circuit failure. 
Chip designers guard against electromigration fail- 
ure by ensuring that /., vcrig(J for every conductor line 
on the chip is lower than the maximum allowed 
value. 

For a conductor line that is switched every cycle, 
the relationship between average current density, 
microprocessor cycle time (^. vclc ), V cM , and cross- 
sectional area is given by^..^ = (CX V M ) /(r cy . cle X 
Cross-Sectional Area). 

It is interesting to note the changes to 7 avcrage 
for a conductor line as a chip is shrunk from 
one generation to the next. The node capacitance, 
C, decreases by slightly more than the scaling 
factor; V m remains constant; T cyc]c reduces by the 
scaling factor since the chip can now run faster; 
and the cross-sectional area decreases by the 
scaling factor. Consequently, y aver . lgi increases by 
slightly less than the scaling factor as the width of 
the conductor line shrinks. If y avCBV now exceeds 
the maximum allowed value, the circuit must 
be redesigned. If there is enough space, 7 ;iverase can 
be reduced by widening the conductor line so 
that the cross-sectional area is increased. From this 
brief analysis, it is clear that it becomes more 
difficult to observe the electromigralion limits as 
the technology scales — even when the metal thick- 
ness is not scaled. As can be seen from they^ 
equation, reducing V (kl from 5.0 V in CMOS-2 to 33 V 



in CMOS-3 helped to counteract the effect of scal- 

in S ^^rage- 
Scaling the interconnect and dealing with electro- 
migration issues are some of the most formidable 
challenges that must be faced as feature sizes con- 
tinue to decrease in the next decade. 

Substrate Contact 

As mentioned earlier, the substrate must be con- 
nected to the of the chip through a Jow- 
impedance path to prevent any rise in voltage. If 
the substrate voltage rise is severe, NMOS source/ 
drain diodes will conduct, and if sufficient charge 
is injected, the chip may latch-up. Latch-up is a 
destructive mechanism involving the parasitic 
bipolar transistors formed by the CMOS process. 
The process, circuits, and V M substrate contact are 
designed to prevent latch-up from occurring. 

The usual industry substrate connection method 
depends on a path through bond wires and the 
package to connect between internal and the 
substrate. To ensure a good substrate contact, 
Digital s CMOS technologies incorporate a deep P+ 
implant (DPI) around the edge of the die to connect 
the metal on the die surface to the low-resistance 
substrate. The implant creates a low-resistance 
path through the P-epitaxial layer in which the 
NMOS and PMOS transistors are formed. 

The DPI is a low-inductance path when com- 
pared to the standard method that connects the 
substrate through the package. The additional area 
enclosed by the path through the bond wires and 
package implies greater inductance, which is unde- 
sirable for high-frequency signals. The DPI path 
between V ss and substrate has low inductance 
because it is made directly on-chip. 

Technology-limited Gate Dielectric 
Thickness 

As stated above, maximizing the current that a tran- 
sistor can supply at a given V cM is of uppermost 
importance for circuit performance. Scaling the 
transistor gate length, gate dielectric thickness, and 
threshold voltage improves the drive current. The 
transistor gate length is constrained by the mini- 
mum polysilicon line-width feature; the minimum 
threshold voltage is set by the leakage current 
allowed when the transistor is turned off. However, 
scaling does not establish a fixed relationship 
between feature size and gate dielectric thickness; 
scaling only determines the change from one gener- 
ation to the next. 
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Figure 5 shows how the saturation currents for 
both NMOS and PMOS transistors in CMOS-4 depend 
on the gate dielectric thickness. The dielectric 
thickness range plotted spans applications from 
microprocessors to SRAMs. Both curves are only 
slightly sublinear; thinning the gate dielectric 
provides almost a one-to-one return in transistor 
saturation current. In high-performance micro- 
processor applications, reliability and manufac- 
turability considerations determine the extent to 
which the gate dielectric thickness can be reduced. 
Digital's CMOS technologies have consistently used 
thinner gate dielectrics than industry norms. 
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a Function of Gate Dielectric 
Thickness for CMOS-4 



Silicided Source /drain and Gate 
The basic gate material for an MOS transistor is 
highly doped polysilicon. The sheet resistivity of 
this material in the CMOS-1 process was 40 ohms 
per square. For CMOS-2, the RC time constant delay 
associated with this sheet resistance would have 
created nonuniform turn-on of wide, fast-switching 
output transistors. A tungsten silicide layer was 
added to the polysilicon to form a polycide. The 
sandwich structure reduced the sheet resistance 
of the gate material to 3 ohms per square. 

Changes to the transistor process for CMOS-3 
technology, which were continued into CMOS-4, 



required development of a new silicided gate pro- 
cess. The new process, known as salicide for self- 
aligned silicide, forms the silicide on the gate and 
on the source/drain regions after all the required 
transistor implants have been completed. This 
reduces the sheet resistance of the source/drain 
regions by more than an order of magnitude and 
allows them to be considered for use in local sig- 
nal routing. The reduced sheet resistance, how- 
ever, does little to improve the current drive of the 
transistor; for a typical CMOS-4 NMOS transistor, 
MINIMOS simulations show that the use of silicided 
source/drain regions improves the saturation cur- 
rent by only 0.6 percent. 12 

Precision Resistor 

Although transistors are the dominant element in 
logic design, a resistor is sometimes needed, for 
example, to match the impedance of an output 
driver with the impedance of a board-level trans- 
mission line that it drives. MOS transistors make 
poor controlled impedance drivers because they 
change impedance as a function of drain voltage. 
One method of controlling the impedance of a 
driver is to use a diffusion resistor as the dominant 
element, as shown in Figure 6. The MOS transistors 
are sized such that their on-state resistance is much 
lower than that of the resistor. Therefore, if the 
transmission line impedance is 50 ohms, the resis- 
tor plus transistor impedance can be sized to match 
that value with little influence from the variations 
in transistor impedance. 

Resistors are constructed from nonsilicidecl dif- 
fusion to meet tolerance requirements that would 
not be possible with a silicided version. Because a 
silicided resistor has lower sheet resistance, it is 
much longer and narrower than a nonsilicided 
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Figure 6 Precision Resistor Use in 
Impedance Matching 
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resistor of the same value. A narrow resistor is 
more susceptible to variations in field dielectric 
encroachment, which lowers the tolerance of the 
resistor The tolerance is lowered further by pro- 
cess variation; it is more difficult to control the 
sheet resistance of the silicided diffusion than that 
of the nonsilicided diffusion. 

The precision resistor is also an important ele- 
ment of our electrostatic discharge Q-SD) protec- 
tion strategy. Figure 7 shows a simple schematic of 
an I/O driver with the USD protection components. 
Clamp 1 is the main path for shunting current dur- 
ing an ESD event. The position and construction of 
resistor (R) 1 and clamp 1 ensure that the clamp 
impedance is lower than that of a shunting path 
through Rl and the driver. HI is also the impedance 
matching resistor for the output driver. R2 provides 
an additional level of protection for the gates of the 
input driver in conjunction with the smaller clamp 
2 placed near the input driver. Both Rl and R2 are 
fabricated using the precision resistor mask layer. 
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figure 7 Precision Resistor Use in Electrostatic 
Discbarge Protection 

Local Interconnect in the SRAM 
As stated earlier, one of the microarchitectural 
methods for increasing performance is to build as 
large a cache memory as possible on the die. Due 
to fabrication complexity, RAM-specific process 
enhancements are generally not implemented in a 
process tailored to microprocessors. However, the 
CMOS-4 technology does include one feature, called 
local interconnect, that significantly decreases the 
cell size. In the CMOS-4 implementation, local inter- 
connect is a titanium-nitride (Ti\) layer that pro- 
vides direct contact between the polysilicon and 
diffusion layers. H 

Figure H shows a comparison of layouts of mem- 
ory cells with and without local interconnect. The 



six- transistor static cell layouts show an array of 
four cells with the PMOS load transistors at the top 
and bottom of the arrays and the \MOS pass transis- 
tors, which provide access to the bit lines, in the 
center of the array. The cell without local intercon- 
nect is 120 fim 2 in CMOS-4, compared to 98 jxm 2 for 
the cell with local interconnect. The 18 percent 
improvement in area is important, but the cell also 
has features that increase yield. There are only 
2.5 contacts between the ftrst level of aluminum 
interconnect (Ml) and diffusion or polysilicon for 
the local interconnect cell, compared to 8 in the 
non-local interconnect cell. None of the Ml in the 
local interconnect cell is at minimum pitch (where 
pitch equals Ml width plus space), while all the 
Ml in the other cell is at minimum pitch, finally, 
the M2 pitch is smaller in the local interconnect 
cell, but at 3.19 jum, it is still greater than the mini- 
mum of 2.63 ju,m allowed by the technology. All of 
these factors add up to a significantly more yield- 
able cell; this is important since 15 percent of the 
area and about two-thirds of the transistors on the 
Alpha 21064 chip are SRAM cells. 

Thick Metal 3 Interconnect 

High-speed operation of the dense circuitry in large 
microprocessors results in a level of power dissipa- 
tion not encountered in gate arrays or RAMs. The 
interconnect of a microprocessor has to carry tens 
of amperes of instantaneous current into and out 
of the chip in addition to routing signals. Further- 
more, since high-performance microprocessor 
clock frequencies are of the order of hundreds of 
megahertz, the on-chip clocks must be distributed 
with very low RC time delay constants. These 
requirements lead to a number of differences 
between the interconnects used for micro- 
processors, gate arrays, and RAMs. RAMs at the l-/u,m 
feature size are usually designed with two levels 
of aluminum-based conductors (Ml and M2) that 
are very similar in thickness (approximately I ju m), 
minimum width, and spacing. Gate arrays generally 
add a third level of aluminum-based interconnect 
(M3) to improve signal routing and gate utilization 
in the array. This level has very similar characteris- 
tics to Ml and M2. Since the transistor density in a 
gate array is low and not all the gates are used in a 
design, the power dissipation is usually moderate 
by microprocessor standards. 

The first two layers of interconnect on a high- 
performance microprocessor are very similar to the 
corresponding layers on an SRAM or gate array 
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Figure 8 Six- transistor SRAM Cell Layout 
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However, because of the high power supply cur- 
rents and low skew clock distribution required for 
high-speed operation, M3 in CMOS-3 and CMOS-4 
processes is approximately 2.5 times thicker. To 
avoid an impact on yield, the pitch of M3 is chosen 
to be approximately three times larger than that 
for M2. To reduce capacitance to Ml and M2, the 
dielectric under M3 is thicker than that between 
Ml andM2. 

Fuses 

As the number of memory cells increases on a 
microprocessor, the impact of those cells on the 
yield of the die increases. CMOS-3 and CMOS-4 tech- 
nologies implement redundancy by a standard tech- 
nique of laser-fusible links to remove bad cells and 
incorporate new ones into the array. Digital's pro- 
cess differs from others in the implementation of 
the fuses, however. Standard RAM processes use 
polysilicon fuses for their small size and ease of 
ablation. Because of the thickness of dielectric lay- 
ers that would need to be etched to uncover the 
fuse, the CMOS-3 and CMOS-4 technologies could 
not use polysilicon fuses. Sufficient control of the 
dielectric etch rate and the selectivity of the dielec- 
tric etch to polysilicon cannot be achieved in a 
manufacturing environment for the thin poly- 
silicon fuse layer to be left intact. Instead, the 
80-nanometer (nm) film of TiN that forms the bot- 
tom layer of M3 is used. The upper layer of alu- 
minum copper (AlCu) is selectively etched through 
a special mask to leave a i-^m wide strip of TiN that 
can be ablated by industry-standard lasers. 1 ' 

Fusible Jinks can be placed on a chip to form an 
identification register. A laser can then be used to 
program a unique code into this register. The con- 
tents of the identification register on the NVAX chip 
can be read by system software so that individual 
die can be uniquely identified not only during 
system manufacture, assembly and test, but also in 
the field. 

Manufacturing Process Variations 
and Chip Design Strategy 

As mentioned earlier, the performance of a CMOS 
process is a function of the current driving capabil- 
ity (/ r/v ) of the PMOS and NMOS transistors, as well as 
the capacitances of the gates, sources, drains, and 
interconnect that the PMOS and NMOS transistors 
must charge and discharge. All of these parameters 
vary in manufacturing. In order to ensure that chips 
will function correctly and at the planned speed, 



chip designers must account for these manufactur- 
ing process variations when the chips are being 
designed. 

The lot-to-lot variation in characteristics for 
CMOS-4 transistors is significantly larger than for 
bipolar transistors. Figure 9 shows how the satu- 
ration current Q DSAI ) varies in manufacturing for 
CMOS-4. The five points on the plot of PMOS ! OSAT 
versus NMOS l nsxr represent the process extremes 
and are often referred to as the process corners. 
Each process corner has a two-letter label: FF, TT, 
SS, FS, and SF. The first letter of the pair is used to 
refer to the PMOS device: F indicates a fast (i.e., high 
current) device; S indicates a slow (i.e., low cur- 
rent) device; and T indicates a typical (i.e., manu- 
facturing target current) device. The second letter 
of the pair refers to the NMOS device. Thus, the FF 
point in Figure 9 represents the fastest PMOS device 
paired with the fastest NMOS device; the SS point 
represents the slowest PMOS device paired with the 
slowest NMOS device; and the TT point represents 
a typical PMOS device paired with a typical NMOS 
device. 

1.5 - 
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Figure 9 Comparison of Saturation Currents 
of NMOS I DSAT and PMOS f DSA7 for 
CMOS-4 Process Comers 

The FF SPICE 15 models are used to predict the 
speed of the fastest chips, the maximum power dis- 
sipation, the transient current demands on the 
power supply, the maximum voltage drops in the 
on-chip power and ground routing, the worst-case 
current density (checked to ensure that the electro- 
migration limits are not violated) in power supply 
and signal lines, and the maximum rate at which 
the chip's signal pins will transition. Power supply 
current transients and signal-pin transition rates 
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have important implications for the electrical 
design of chip packages. Both place limits on how 
much inductance can be tolerated in the leads. The 
maximum power dissipation has obvious thermal 
implications for package and heat sink design. 

The SS SPICE models are used to predict the speed 
at which the slowest chips will run. The TT models 
are used to check that the circuits on the chip will 
run at the desired speed when the manufacturing 
process is at the center of its range. The bulk of the 
circuit design work for the NVAX and Alpha 21064 
chips was done using the TT models. 

The FS and SF SPICE models are used to determine 
the noise margin for circuits (DC circuit analysis) 
rather than to predict the speeds at which circuits 
will run (AC analysis). The FS model has a semi-fast 
PMOS transistor paired with a semi-slow NMOS tran- 
sistor; the SF model is just the opposite. The param- 
eters that determine J DSAT for PMOS and NMOS 
transistors are correlated. These correlations are 
taken into account in the FS and SF models. The cor- 
rect operation of some CMOS circuits is particularly 
sensitive to the ratio of the PMOS to NMOS currents. 
By using the FS and SF models to simulate these cir- 
cuits, designers can verify that the circuits will 
function correctly in spite of variations in the man- 
ufacturing process. A simple example is given in 
Figure 10, which shows how the switching point of 



a CMOS inverter changes from FS to TT to SF. Circuit 
designers use DC simulations like these to deter- 
mine the safe bounds for the sizes of transistors 
in a variety of common circuit structures. These 
bounds are incorporated into the design methodol- 
ogy for the project, and CAD tools are used to 
search the circuit schematic database for structures 
that violate the methodology. 

When creating schematics, circuit designers use 
technology-specific rules of thumb to estimate 
the interconnect capacitance on signal lines, etc. 
Layout for the schematics is then generated, exact 
capacitances are extracted from the layout using 
CAD tools, the capacitance estimates are replaced 
with the extracted values, and the circuits are 
resimulated to ensure that they still meet the 
specifications. The capacitance extraction tool can 
be rerun for a different process corner (dielectric 
thicknesses, etc.) by changing its parameter file. 

Although not discussed here, environmental 
effects such as operating temperature and power 
supply variations must also be taken in account. 

If CMOS chips are to be manufacturable, design- 
ers must account for process variations during the 
design phase by following procedures such as those 
just outlined. In order to get to market quickly, 
the NVAX microprocessor was being designed while 
the CMOS-4 process was being developed. Process 
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simulators and test chips were used to generate the 
CMOS-i worst -case and typical electrical models 
for transistors and interconnect during the early 
phase of process development. The accuracy of 
these models was critical to the successful and 
timely completion of the WAX design: it was never 
necessary to redesign circuits due to process or 
model changes during the course of the project. 

Conclusion 

Digitals CMOS processes have been developed 
specifically for high-performance microprocessors, 
(ienerat ion-to-generation improvements derived 
from scaling, increased die area, and new technol- 
ogy features have allowed increased performance 
every two years. The Alpha 2106 4 and WAX chips, 
implemented in CMOS-4, are the highest performing 
reduced instruction set (RISC) and complex instruc- 
tion set (CISC) microprocessors reported in the 
industry to date. 
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Numerical Device and 
Process Simulation Tools 
in Transistor Design 

Numerical device and process simulation programs are fundamental tools in the 
design and characterization of silicon transistors. These tools employ numerical 
mathematical methods to simulate the operation of the elemental transistor struc- 
tures that are the building blocks of CMOS VLSI circuitiy. WTien designing these 
basic structures, CMOS process and device design teams require efficient, high- 
performance simulators that use accurate physical models. Digital has developed 
thermal annealing, mobility, and avalanche models, and has improved the numer- 
ical methods used in its process and device simulation programs. Also, supporting 
software was developed to help integrate the various simulation tools. 



With the increasing number of transistor functions 
implemented on each chip in complementary 
metal-oxide semiconductor (CMOS) very large-scale 
integration (VLSI) circuitry, computer simulation 
tools have become essential at all levels of the 
design process. This is particularly true at the level 
of metal-oxide semiconductor (MOS) device fabri- 
cation and elemental transistor design. The con- 
tinuing reduction in the geometrical scale of the 
transistor structures means that careful control and 
design of these structures is necessary to maintain 
the required switching properties and current drive 
capabilities. 

Process and device simulators contribute to this 
design and control by employing numerical mathe- 
matical methods to simulate the ion flux that 
occurs in the device fabrication process and the 
current flow that occurs during transistor opera- 
tion. Process simulation requires physical models 
of ion implantation, diffusion of ions, and thermal 
oxidation, for example. In device simulation, the 
basic semiconductor equations of drift and diffu- 
sion are solved using microscopic physical models, 
such as models of the mobility of electrons and 
holes and models of electron-hole pair generation, 
also referred to as avalanche generation. 

For practical application, the simulators must 
be capable of high-speed performance and must 



provide accurate predictions. Digitals Advanced 
Semiconductor Development (ASD) Submicron 
Simulation Group modifies and extends the capabil- 
ities of simulators that have already been developed. 
These modifications and extensions have resulted in 
accelerated performance, higher accuracy, and thus 
improved application of the simulators to Digital's 
CMOS technologies. The process simulators SUPRJEM3 
from Stanford University and PROMIS from the 
Technical University of Vienna (TUV), Austria, and 
the device simulators MINIMOS from the TUV and 
PISCES from Stanford are the main simulators that 
Digital modified and applied. In addition, the ASD 
Group has developed software to support the simu- 
lators. This software consists of programs to inter- 
face the process simulators with the device 
simulators, and programs to automatically perform 
si mulator calibration and sensitivity analysis. 

The modified simulators and supporting soft- 
ware are vital elements in the development of 
Digital's CMOS technologies. These tools provide 
invaluable insight into transistor fabrication and 
operation. Using these tools in the design of lot 
splits considerably decreases fabrication time and 
thus reduces cost. To predict circuit performance 
limits, designers use calibrated simulation results 
as input to circuit simulators. The ability to pre- 
dict these limits has made possible concurrent 
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technology and circuit design of Digital's CMOS-2, 
CMOS-3, CMOS-4, and CMOS-5 technologies. 

This paper describes the nature of Digital's 
process and device simulation tools. Examples of 
the important physical models, numerical mathe- 
matical methods, and supporting software for the 
simulators are discussed. The paper closes with a 
summary of how these tools benefit Digital's semi- 
conductor process development teams. 

Physical Models 

The key to accurate simulations of transistor char- 
acteristics is in the physical models employed by 
the programs. This section describes examples of 
models developed by the ASD Group. First, a rapid 
thermal annealing model for process simulation is 
presented. Next follow discussions on ion implan- 
tation through nonplanar surfaces in two dimen- 
sions and on the mobility and avalanche models 
used in the MINIMOS program. The section con- 
cludes with information about the use of simula- 
tors to predict transistor capacitance values. 

Rapid Thermal Annealing Model 
To alter the electrical properties of the semiconduc- 
tor substrate material, i.e., the silicon wafer, atoms 
from groups III and V of the periodic table are 
used for doping. A known amount of these atoms, 
also called impurities or dopant, must be placed 
in the silicon lattice. Ion implantation is the main 
technique for incorporating the impurity atoms. 
However, the implanted atoms do not move into 
the proper sites upon implantation. A high- 
temperature treatment, known as thermal anneal- 
ing, is required to achieve this. 

There are two types of thermal annealing used in 
semiconductor processing: conventional furnace 
annealing (CFA) and rapid thermal annealing (RTA). 
CFA is a long-term (minutes to hours) annealing 
step carried out at moderate temperatures (below 
1000 degrees Celsius). RTA is a recent technique 
conducted at higher temperatures (usually above 
1050 degrees Celsius) for extremely short times 
(seconds). 

Ion implantation is a defect-producing process 
that creates lattice disorder and point defects, such 
as silicon vacancies and interstitials, in an other- 
wise perfect lattice. (A silicon vacancy occurs 
when a silicon atom is missing from a perfect sili- 
con lattice, whereas a silicon interstitial occurs 
when an extra silicon atom is squeezed into a per- 



fect lattice.) Since ion implantation is performed at 
room temperature, the collection of implantation- 
induced defects is retained in a stable state in the 
lattice. However, at high temperatures the defects 
become highly mobile and influence the migration 
of impurity atoms. 

The migration of impurities under annealing 
is governed by the diffusion phenomena, which 
are mediated by point defects. Ion-implantation- 
induced point defects can cause anomalous dif- 
fusion of the dopant. This effect is highly transient 
in nature because either the ion-implantation- 
induced defects disappear to the surface or to the 
depth of the silicon (referred to as the silicon 
"bulk"), or the defects may recombine while inter- 
acting with the dopant. An accurate understanding 
of point defect behavior is particularly important 
for small geometry transistors requiring ultra- 
shallow junctions with high doping levels. 

Although research in point defect physics in sili- 
con is extensive, point defect behavior is still not 
well understood. One contributing factor is the 
lack of a reliable technique to study point defects 
quantitatively, i.e., knowledge of point defects is 
obtained only through indirect observations of 
their effect on dopant diffusion. There remains 
significant controversy surrounding the experi- 
mental observations and the corresponding inter- 
pretations. At the macroscopic level, there are 
models that solve the diffusion equation using an 
average diffusivity for dopant. However, the con- 
ventional diffusion models do not take into account 
the details of defect-dopant interactions resulting 
from high-dose ion implantation. Recently, several 
models, both physically based and empirical, have 
been proposed to simulate transient-enhanced dif- 
fusion under high-close ion implantation. The phys- 
ically based models require an accurate account 
of ion-implantation-induced defect concentration 
and often use time-consuming Monte Carlo meth- 
ods. Empirical models, on the other hand, are fast, 
but must be based on detailed physics and well- 
calibrated parameters. 

The ASD Group developed a new empirical 
model called implantation-enhanced transient dif- 
fusion (IETD). 1 This model has been used success- 
ful ly in the CMOS technology development in 
Digital's semiconductor manufacturing facility in 
Hudson, MA. The IETD model is phenomenological 
and is based on a dual vacancy-interstitial mecha- 
nism, with model parameters determined empiri- 
cally. The model uses a relationship that links the 
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amount of ion-implantation-induced defects to ion 
implantation conditions, such as dose and energy. 

The overall transient diffusion process, which 
depends on the annealing temperature, takes from 
several seconds to a few minutes to complete. Con- 
sequently, a relatively short time interval is available 
to limit the role of point defects and to control the 
transient diffusion, particularly when fabricating 
shallow junctions below 0.2 micron Om) for devices 
less than 0.5 /i.m in size. The IETD model provides 
good predictability of transient diffusion, based on 
point defect behavior. This feature allows designers 
to study the effect of various processing conditions 
on transient diffusion and thus to optimize the pro- 
cess. Figure 1 shows a comparison of secondary ion 
mass spectroscopy (SIMS) data with arsenic diffu- 
sion, both with and without the use of the IETD 
model in the SUPREM3 process simulator. Clearly, 
using the IETD model gives more accurate results. 




DEPTH INTO SILICON (MICRONS) 

KEY: 

▼ SIMS DATA WITH ARSENIC DIFFUSION 
SUPREM3 WITHOUT IETD MODEL 
SUPREM3 WITH IETD MODEL 



Figure 1 Comparison of SIMS Data with 
Results from Annealing Models 

Implantation through Nonplanar Surfaces 
A problem frequently encountered in semiconduc- 
tor technology is the implantation of a dopant 
through thin dielectric layers on semiconductors 
or, in general, the implantation in multilayer struc- 
tures. Calculating the exact depth distribution of 
implanted ions in a multilayer structure requires 
the solution of Boltzmann transport equations or 
Monte Carlo simulation. This process, however, is 



very complicated and requires a great deal of com- 
puting time. 

To circumvent these disadvantages, several ana- 
lytical models have been developed that give rea- 
sonably good results under certain conditions. One 
of these models, the numerical range-scaling 
model, is applicable in the limit of both very thin 
films and very thick films and leads to the most real- 
istic profiles of all known analytical models. 2Ai 
Therefore, this model is implemented in the two- 
dimensional (2-D) process simulator, PROMIS. Using 
this numerical range-scaling model makes it easy to 
extend the 2-D model for arbitrarily shaped, non- 
planar multilayer structures. In addition, the tilt 
angle of the implantation can be varied. Thus, it is 
possible to see the dependence of the doping pro- 
file on the tilt angle. The change in the analytical dis- 
tribution function caused by changes in channeling 
by different implantation angles has not yet been 
considered. 

Figure 2 shows an application of the newly 
implemented ion implantation feature in PROMIS. 
The simulation starts from a rectangular semicon- 
ductor region. The first process step is a local wet 
oxidation for 30 minutes at 1 100 degrees Celsius. 
Boron ions are implanted into this structure with 
an energy of 50 kilo electron volts (keV) and at a 
dose of 5.0 X 10 1 7cm 2 . The iso-concentration lines 
of the 2-D as-implanted boron profile can be seen 
in Figure 2. Each contour line represents an order 
of magnitude change in boron concentration. 
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Figure 2 Two-dimensional Boron Implant into 
Silicon, Simulated Using PROMIS 
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Although the numerical range-scaling model is 
the most accurate way to analytically describe 
implanted dopant distributions for multilayer struc- 
tures, the model has some shortcomings and 
restrictions. For example, the analytical distribu- 
tion function does not depend on the tilt angle and 
the silicon orientation. This shortcoming could 
probably be solved by introducing a term that 
depends on these two parameters and could be 
fitted using measured profiles. Another shortcom- 
ing, as compared to a Monte Carlo simulation, is 
that the lateral distribution function does not con- 
sider interfaces. This deficiency could be especially 
important for structures with very steep interfaces, 
such as trenches. However, this neglect of the inter- 
faces in the case of the lateral distribution function 
is only significant if the mean atomic numbers of 
the materials (on both sides of the interface) differ 
considerably. In the case of Si and silicon dioxide 
(SiO,), for example, it is not a major shortcoming. 

The model described above is implemented in 
PROMIS in such a way that there are, in principle, no 
restrictions in the simulation geometry. This means 
that the implantation module can handle arbitrary 
multilayer structures. The implementation of this 
ion implantation model is thus a significant step 
towards extending PROMIS to a fully multilayer sim- 
ulation tool. 

Mobility Model 

Carrier mobilities in semiconductor material are 
determined by a large variety of physical mecha- 
nisms. Electrons and holes are scattered by thermal 
lattice vibrations, ionized impurities, neutral impu- 
rities, vacancies, interstitials, dislocations, surfaces, 
and the carriers themselves. The saturation of the 
drift velocity caused by interactions with lattice 
vibrations results in a further mobility reduction. 
For MOS transistors, however, the effect of the sur- 
face, i.e., the silicon-silicon dioxide (Si-Si0 2 ) inter- 
face, is of overriding importance. The sheet of 
conducting charge (either electrons or holes), 
called the inversion layer, is forced by the applied 
electric fields to flow close to this interface and 
interact with it. 

The surface scattering is a complex and poorly 
understood process consisting of a combination of 
roughness, interface charge, and surface phonon- 
scattering mechanisms. Nevertheless, there is a uni- 
versal empirical model, first demonstrated in 1979 
and 1980, that can be calibrated against measured 
results.^ 6 An effective mobility exists that depends 



only on an effective field perpendicular to the sili- 
con surface and is independent of the doping level 
near the surface. 

The ASD Group modified the microscopic mobil- 
ity model in the MINIMOS simulator to reflect this 
universal mobility model. 7 The microscopic model 
contains three adjustable dimensionless parame- 
ters called MR, MT, and MX, which are generally 
close to unity. These parameters scale the magni- 
tude and the two field dependencies of the micro- 
scopic mobility. The model was originally 
calibrated against CMOS-2 electrical data in 1986. 
Note particularly that recent CMOS-4 data compare 
well to simulated data with only minor adjustments 
made to these three parameters. Figure 3 is a com- 
parison of MINIMOS simulation for a device having a 
long-channel (51-micron) length with the linear 
region drain current graphed as a function of 
the gate voltage (Tp. In this section and the follow- 
ing two sections, the width of all the simulated 
and measured MOS devices is 50.5 ju,m. The scaled 
mobility parameters for this excellent fit are an MR 
of 1.02, an MT of 1.10, and an MX of 1.00, as com- 
pared to the default 1986 values of 1.00 for each 
parameter. Figure 4 shows the fit for a CMOS-4 
short-channel device, i.e., 0.62 /Ltm in length (an 
effective length of 0.5 /urn), using the same mobility 
parameters as for the long-channel device. Adjust- 
ments were made to allow for the effects of inter- 
face charge and contact resistance. The agreement 
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Figure 4 Comparison of MINIMOS Simulation 
to Short-channel CMOS-4 Device Data 

of the simulated results with the measured data is 
again excellent and demonstrates the validity of the 
MINIMOS mobility model. 

Avalanche Model 

An important part of submicron MOS device design 
is device reliability. Aggressively scaled devices 
contain high electric fields. These fields are 
unavoidable, even with the reduced power supply 
of 3 3 volts for the CMOS-4 device. In addition, cir- 
cuit effects, such as ringing, can cause voltages on 
devices to rise well above the magnitude of the 
power supply Electrons in these high electric fields 
accelerate and begin to acquire energy faster than 
they can dissipate it to the underlying silicon lat- 
tice. A small fraction of the electrons, namely ener- 
getic or "hot" electrons, gain enough energy to 
generate an electron-hole pair in a process called 
impact ionization. The newly created electrons are 
swept to the drain of the transistor, and the holes 
are collected as a current called substrate current. 

Other energetic electrons surmount the barrier 
at the Si-Si0 9 interface and inject themselves into 
the oxide, appearing as gate current, which flows 
out of the gate terminal. Carrier injection into the 
oxide can damage the oxide and cause a shift in the 
device threshold voltage or degrade the ability of 
the device to conduct current. This injection is the 
physical cause of device degradation. 

Unfortunately, gate current is difficult to mea- 
sure directly, because the currents involved are 



extremely small. As a fallback, it is common prac- 
tice to use the substrate current as a monitor of 
hot electron damage, since both currents arise 
from the same high field conditions in the device. 
Using an accurate substrate current model can 
assist device designers in optimizing the CMOS tran- 
sistor for reliability. 

In late 1988, at the time of early CMOS-4 devel- 
opment, the avalanche generation model in the 
MINIMOS simulator was reexamined. The design 
team found that the peak substrate current for a 
given drain voltage (^) occurred at gate voltages 
lower than predicted by simulation. They resolved 
this problem by modifying the microscopic model 
for impact ionization in the MINIMOS simulator to 
include a depth-dependent term, similar to the one 
used by Slotboom et al. a9 The modification sharply 
reduced impact ionization near the Si-Si0 2 inter- 
face. Impact ionization model parameters were 
derived from CMOS-3 and CMOS-4 measured data. 

Figure 5 shows measured and simulated CMOS-4 
substrate current data for drawn gate lengths of 
0.62 and 2.00 tim at a v a equal to 33 volts. Figure 6 
shows the substrate current as a function of the V 
for the 0.62-micron gate length device for three 
drain voltages around the design center of 3 3 volts. 
The agreement shown using one set of model 
parameters is quite remarkable, given the limited 
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Figure 6 Comparison of MINIMOS Simulation 
to CMOS-4 Substrate Current Data 
for a Short-channel Device and 
Three Drain Voltages 

physics and empirical nature of the model. 
Although the modified model has proven satisfac- 
tory for CMOS-4 devices, we are examining other, 
more physically based, models of substrate and 
gate current for our future generations of CMOS 
devices. 

Capacitance Simulation 
The speed of MOS circuits partially depends on the 
amount of device capacitance that must be charged 
during circuit state transitions. Device simulators 
can tell device designers how large the device capac- 
itances will be and what effect changes in the man- 
ufacturing process will have on capacitance size. 
The gate capacitance of the MOS transistor is one of 
the two key types of device capacitances; the other 
consists of the so-called "diode" capacitances of the 
source and drain junctions. The gate capacitance is 
split into three parts: gate-to-source (C ), gate-to- 
drain (C gd ), and gate-to-bulk (C^). Figures 7 and 8 
show simulated and measured C gs and C^ (l values 
versus gate and drain voltages for a device with a 
width of 50.50 fim and a length of 0.62 /xm. C gb is not 
shown because it is a negligibly small quantity for 
voltages above the transistor threshold voltage. 

Achieving the good agreement between simula- 
tion and measurement shown in Figures 7 and 8 is 
a difficult undertaking. Accurate capacitance mea- 



Figure 7 Comparison of MINIMOS Simulation 
Data to CMOS-4 Gate-to-source 
Capacitance Data for Four Drain 
Voltages 
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surements on such short-channel devices require 
careful experimental techniques to reduce the 
effects of parasitic capacitance and to ensure that 
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the DC power supply can sink sufficient current. 
For example, with the aid of simulation, it was 
shown that small parasitic resistances, on the order 
of 10 ohms, can significantly shift the capacitance 
curves of the device. Such resistances must be min- 
imized to yield accurate results and to verify that 
simulation and measurement can be correlated. 

On the simulation side, uncertainties in the con- 
centration of dopant in the channel region and 
the source-drain regions have a significant impact 
on the capacitance characteristics. The ASD Group 
used this fact to advantage in a "reverse engineer- 
ing" of the doping profiles. They compared the mea- 
sured data with the simulated capacitance using 
doping profiles generated by the process simula- 
tors. Then, they revised the profiles until the simu- 
lated capacitances matched the measured ones. 
The ASD team applied this reverse engineering 
method to both channel doping into depth and lat- 
eral source-drain doping. In the former case, the 
deep depletion capacitance-to-voltage measure- 
ments of doping were verified using simulation. In 
the latter, the doping was extracted by examining 
the relationship between the C d and the V d when 
there is no V g . Using this method, it was possible to 
extract doping profiles that were unattainable by 
other analytic means, such as SIMS. 

The CMOS-4 devices exhibit a slight polysilicon 
depletion effect, which decreases the measured 
capacitance of the MOS transistor gate. Work per- 
formed in parallel by the ASD Group and at the TUV 
has resulted in an implementation of a polysilicon 
depletion model in the MINIMOS simulator. 10 The 
model is unique in that it analytically solves the 
Poisson equation in the polysilicon gate and thus 
can be implemented in MINIMOS as a simple modifi- 
cation of the gate boundary condition. Figure 9 
shows a comparison of the model with simulation. 
The measured total gate capacitance of a large-area 
MOS diode (50.5 by 50.5 as a function of the V is 
presented, along with MINIMOS simulation results. 
For reference, Figure 9 also shows the simulated 
gate capacitance without the polysilicon depletion 
effect, computed using the PISCES device simu- 
lation program. The PISCES results were used as 
an independent check on the MINIMOS simulator 
results. The curves indicate that the modeled 
results are a good approximation of the measured 
data. At CMOS-4 dimensions, the polysilicon deple- 
tion effect is rather small, i.e., on the order of 5 per- 
cent. The effect becomes more pronounced at 
thinner gate oxides. 
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Numerical Mathematical Methods 

The efficient use of CPU time and memory in the 
device and process simulators is directly depen- 
dent on the efficient implementation of state-of- 
the-art numerical mathematical algorithms and 
physical models. Because of the 1 imitations on rep- 
resenting numbers in computers, these algorithms 
and models must incorporate many detailed 
enhancements in order to maintain stability of the 
simulation programs in the numerical and physical 
sense. This is especially true in the case of simula- 
tion in three space dimensions. This section first 
describes the improvements to the MINIMOS device 
simulator achieved during the last few years. An 
application of the improved MINIMOS to a parasitic 
device in shallow trench isolation is then pre- 
sented, followed by a discussion of automatic space 
grid design. 

Performance Improvements Achieved 
for the MINIMOS Device Simulator 
A three-dimensional (3-D) simulation program 
requires the solution of very large matrices and is 
thus time-consuming. Therefore, an accurate use of 
resources is obligatory. This goal was achieved for 
the MINIMOS device simulator by means of software 
techniques and hardware options. u12 
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Various algorithms and numerical methods were 
thoroughly tested with practical examples, because 
not all algorithms presented in the literature are 
applicable. Most of the proposed algorithms, and 
most of the improvements related to overall perfor- 
mance gain, are found among the linear solvers. 
These solvers are applied to the solution of the 
large sparse matrix systems that arise from the 
discretization of the partial differential equations. 
The best-known linear solver algorithm is called 
Gaussian elimination; this algoritlim is the most sta- 
ble method of solving a system of linear equations. 
Unfortunately, Gaussian elimination is generally 
not applicable to the 3-D simulations because enor- 
mous amounts of memory and CPU time are 
required. Instead, iterative solvers have been tested 
for stability, using several techniques, which are 
required because of the variation over a range of 
many orders of magnitude in the coefficients and in 
the dependent variables. 

After an accurate comparison of the convergence 
behavior of the different linear iterative solvers, a 
final set of algorithms was identified, one which is 
adequate for the present. These algorithms make 
the 3-D simulation program more efficient by taking 
into account the different properties of the Poisson 
equations (solutions for electric potential) and the 
continuity equations (solutions for current flows). 

Strong emphasis has been placed on reducing the 
amount of duplicate code in the simulation pro- 
gram. MINIMOS is a hierarchically ordered program 
which first solves an approximate 2-D problem in 
three steps and then proceeds to solve the actual 
3-D problem with the 3-D algorithms. This method 
guarantees an efficient use of the computer 
resources. However, sometimes the incompatibility 
of the 2-D and 3-D codes means that separately func- 
tioning code must be used. Additional develop- 
ment work has been done to create common code 
in all important areas, including physical models 
(e.g.. mobility models) and the numerical algo- 
rithms. Future upgrades and modifications for 2-1) 
and 3-D codes can be done simultaneously. This 
improvement also contributes significantly to the 
performance improvement. 

An investigation using the VAX Performance and 
Coverage Analyzer (PGA) was carried out to identify 
so-called *hot spots,' 1 which are code sequences 
that consume high amounts of CPU time when exe- 
cuted. Eliminating or improving the performance 
of these code segments significantly reduces simu- 
lation time. 



The VAX FORTRAN high-performance option 
(HPO) compiler was also used to improve the per- 
formance of the MINIMOS 3 D simulation program. 
The ASD Group investigated both parallel ization 
and vectorization on a multiprocessor VAX system 
with vector hardware. The performance improve- 
ment resulting from parallelization (using two 
CPUs) and vectorization (using two vector units) 
was as much as a factor of live. During the last 
few years, software enhancements, including the 
removal of hot spots, the improvement of algo- 
rithms, and the creation of common code, have 
resulted in a thirtyfold performance improvement 
of the program. Thus, the combined software and 
hardware enhancements improved the perfor- 
mance of the MINIMOS 3-D simulation program by 
approximately a factor of 150. 

Many of these improvements also enhance per- 
formance on scalar processors. The MINIMOS pro- 
gram is run on a variety of VAX and MIPS processors 
at several sites in the United States and in Europe. 
Engineers can submit MINIMOS (or other) simula- 
tion programs from their VAXcluster systems run- 
ning under the VMS operating system to various fast 
processors on the local network. This improves the 
overall turnaround time of computationally inten- 
sive jobs and provides a painless way for engineers 
to better use available computational resources. 

Application to a Parasitic Device 
The improved performance of the 3-D MINIMOS sim- 
ulation program made it possible to analyze the 
behavior of shallow trench isolated devices. 
Electrical data on these devices in early develop- 
ment work showed a "bump" in the subthreshold 
drain current versus gate voltage characteristic 
under certain conditions. The ASD Group simulated 
this bump using the 3-D MINIMOS program and 
demonstrated that the origin of the bump was in 
a parasitic current along the trench sidewalk The 
2-D version of the MINIMOS program was unable 
to explain the phenomenon. Only when the 3-D 
version was applied to the problem (taking into 
account a slight overlap of the gate polysilicon into 
the trench) was it apparent that a parasitic device at 
the edge of the trench was turning on at a lower 
gate voltage than was the main device, thus causing 
the bump. 

Figure 10 shows a comparison of the MINIMOS 
simulations with the electrical data. The bump is 
resolved by the 3-D MINIMOS simulation but is not 
present in the case of the 2-D simulation. The para- 
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sitic device turns on before the main device, but its 
effective width is much less than that of the main 
device. Thus at higher gate voltages, the main 
device, which can be simulated with the 2-D ver- 
sion of the MINIMOS program, is dominant. The sim- 
ulations have been adjusted for threshold and back 
bias. Further simulations indicated that increased 
back bias on the device greatly enhances the bump; 
the parasitic device at the corner of the trench 
is partially shielded from the effects of the back 
bias. The improved 3-D MINIMOS program is cur- 
rently used in the design of the CMOS-5 technology, 
for continued electrical analysis of shallow trench 
isolation. 
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Automatic Grid Design 

In process and device simulation, the numerical 
mathematics requires the construction of a dis- 
cretization scheme, an underlying space grid and, 
for time-dependent problems, a time grid, which 
should be controlled automatically. This basic grid- 
ding determines not only the accuracy of the solu- 
tion but also the run time of the program, and 
in some cases even the success or failure of the 
simulation. 

The design of an automatic space grid requires 
both mathematically based and physically based 
strategies. 13 M Mathematical criteria for self-adaptive 
gridcling involve the remainders in series expan- 



sions, equidistribution of the discretization error, 
the degree of coupling of the differential equations 
to be solved, and finally, the control of the ratio of 
adjacent mesh distances. Attention to these details 
results in the achievement of theoretical conver- 
gence rates and reasonable CPU times. 

Figures 11 and 12 show respectively an implanted 
boron profile and the error indicator for the masked 
implantation of boron. The mask ends at the origin, 
and the error criterion is based on the difference 
between the second mixed derivatives of the solu- 
tion function. Figure 12 clearly shows that the error 
is concentrated at the curvature of the profile 
around the mask edge. 




Figure 11 Masked Ion Implantation of Boron 
Showing the Space Grid 

Almost all physical criteria are heuristic, i.e., they 
use quantities like the current densities or the net 
generation/recombination of carriers to design the 
grid properly in regions of physical interest. For 
example, numerical experiments indicate that in 
device simulation, mesh refinement is essential in 
regions where the net generation/recombination of 
carriers becomes higher than 1.0 X 10 22 /cm 3 -s. 

Conservation laws play an important role in the 
definition of these strategies. For instance, coarse 
grids at curved pn-junctions introduce a significant 
integration error, thus leading to insufficient charge 
conservation, which is essential for accurate capac- 
itance calculations. 15 For the automatic calculation 
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Figure 12 Error Indicate r for the Masked Ion 
Implantation of Boron 

of time steps, heuristic criteria are used also. These 
criteria take into consideration the change in 
boundary conditions and try to estimate the time 
discretization error by analyzing the solution for 
two successive time steps. More mathematically 
based criteria, such as utilizing the change of the 
potential energy, are under investigation. 

Grid design strongly affects the run times of a 
simulation program. Usual ly the numerical effort of 
a two-dimensional simulation program depends 
quadratically on the number of grid points. Thus a 
50 percent reduction in the number of grid points 
reduces the run time by a factor of four. Recent 
work at the Campus-based Engineering Center 
(CFC) in Vienna on gridding strategies in the PROMIS 
simulator has resulted in improving performance 
by a factor of 2.5 for the same accuracy. Automatic 
grid design is absolutely necessary to optimize the 
ratio between the number of grid points and the 
desired accuracy for each specific boundary condi- 
tion. For time-dependent simulations, in particular, 
moving grids is the only way to guarantee the same 
accuracy for all time steps. Work is continuing in 
this area at both the ASD Group in Hudson and 
the CHC 

Technology Computer-aided Design 

Historically, process and device simulation tools 
have been developed individually as standalone 
tools. No single tool covers the whole range of 
processing steps and device simulation needs that 
can arise in CMOS transistor design. Consequently, 



users must combine different tools to perform 
required simulations. The ASD Group has devel- 
oped interface programs to assist in this effort. On a 
larger scale, research is going on in industry and 
in academia to develop the so-called technology 
computer-aided design (technology CAD or TCAD) 
frameworks for integrating the various simulation 
tools. 16 nis The ASD Group has been involved in this 
effort by actively participating in the United States- 
based TCAD group of the CAD Framework Initiative 
(CFI) and by interacting with the researchers at the 
TUV, who are developing the Viennese Integrated 
System for Technology CAD Applications (VISTA) 
framework discussed later in this section. These 
efforts hold much promise for the future, but the 
ASD Group has required intermediate-term solu- 
tions to speed up and automate the various proce- 
dures used by its transistor designers. This section 
presents two such solutions followed by a brief 
description of the VISTA system currently under 
development. 

TCAD Command Language 
The simulation needs and demands of Digital's 
CMOS technology development have increased 
beyond isolated simulation runs to include more 
complex tasks such as sensitivity analysis, optimiza- 
tion, and macromodeling. To support these higher- 
level needs, the ASD Group developed the TCAD 
Command Language (TCL). TCL is a specialized pro- 
gramming language for TCAD task management and 
execution, an instance of a command extension 
language tailored to TCAD users' needs. The follow- 
ing list highlights features of TCL that are necessary 
in the ASD TCAD environment: 

■ General programming constructs and control 
mechanisms 

■ Specialized subroutines suitable for optimiza- 
tion and sensitivity analysis, with arguments 
divided into input parameters, and input and 
output variables 

■ Tool control and manipulation that support envi- 
ronment customization, journaling, distributed 
execution, and parameterized tool invocation 

■ Special analysis and optimization commands 
bundled in a callable run-time library 

TCL has been used in a variety of design and anal- 
ysis tasks that include exploration of the design 
space, parameter sensitivity, design through opti- 
mization, model parameter extraction and charac- 
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terization, and statistical analysis. For example, 
Figure 13 shows TCL code that illustrates the appli- 
cation of the TCL OPTIMIZE command for simulator 
calibration. Using this code, the linear region mobil- 
ity parameters used for the MINIMOS program and 
discussed in the Mobility Model section are auto- 
matically adjusted so that the MINIMOS calculated 
current values fit the experimental data. This auto- 
mates a routine and time-consuming task formerly 
done manually. 

Vienna Interactive Data Editor 
The Vienna Interactive Data Editor (VIDE) provides 
a set of tools for data manipulation and data trans- 
formation. The program, i.e., toolbox, consists 
of three parts: input, manipulation, and output. 
Separating the parts in this way makes it easier to 
implement additional data formats from a variety of 
programs. For this toolbox, there are five classes of 
operations that apply to either one, two, or three 
dimensions: 

■ Geometry handling (e.g., stretching, scaling, 
and shifting) 

■ Quantity handling (e.g., stretching, scaling, 
shifting, and least squares fit) 

■ Quantity arithmetic (e.g., multiplication, square 
root, and logarithm) 



■ Grid handling (e.g., creation of new grid, inter- 
polation, and merging of grids) 

■ Tools (e.g., plotting and rotation) 

Additionally, powerful macros can be defined as 
procedures in command files. The expansion of 1-D 
doping profiles to 2-D by elliptic rotations is an 
example of such a macro. To illustrate the expan- 
sion, this section describes the combination of the 
2-D source and drain profiles (arsenic and phos- 
phorus) of a MOS transistor simulated by PROMIS, 
and the 1-D channel profile (boron) simulated 
by SUPREM3. First, the 1-D boron profile from 
the SUPREM3 simulator is expanded to 2-D and 
stretched to the appropriate geometry. Then, the 
profile is interpolated on the grid defined by 
PROMIS. The boron profile is the acceptor profile, 
whereas the sum of the arsenic and the phospho- 
rus profiles gives the donor concentration. The 
result, consisting of the net 2-D doping, is shown in 
Figure 14. 

The applications of VIDE are not restricted to pro- 
cess and device simulation. The toolbox allows the 
manipulation of data of arbitrary origin. The pro- 
gram has been successfully used to analyze the 
accuracy of simulation results for the calculation of 
the absolute and relative error, as well as for least 
squares fits to measurement data. Furthermore, 



BEGIN 



! First the block definition. 



DEFINE BLOCK TUNE 
PARAMETERS MR, MX, MT 
INPUTS L, VGS 
OUTPUTS IDS 
BEGIN 

READ COMMANDS T1.MMI TEMPLATE 



/ IN = TEMPLATE 
/ IN = TEMPLATE 
/ IN=TEMPLATE 
/ IN=TEMPLATE 
/ IN=TEMPLATE 



REPLACE $MR-VALUE$ / B Y = M R 
REPLACE $MX-VALUE$ / B Y = MX 
REPLACE $MT-VALUE$ / B Y = MT 
REPLACE $VGS-VALUE$ /BY=VGS 
REPLACE $L-VALUE$ / B Y = L 
FWRITE MINIRUN.MMI TEMPLATE 

MINIMOS MINIRUN.MMI / 0 U T = M I N I R U N . M M 0 / D 0 P I N G = T 1 . 2 D 0 P 
READ MINIRUN.MMO IDS /INT0=IDS 
END 

! Initial parameters values in INIT.PAR, Final values in T1.SAV 
! Experimental data in T1.DAT. 

OPTIMIZE TUNE / DAT A = T 1 . D AT /INIT = T1.PAR /SAVE = T1.SAV 
END 



Figure 13 TCAD Command Language Code Illustrating the Application of the OPTIMIZE Command 
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Figure 14 Net Doping Profile Combining the 
2-D Source and Drain Profiles from 
PROMIS with the FD Channel 
Profile from the SUPRIM3 
Simulator 

VIDE serves as a transformation program between 
different data formats of simulators. 

VISTA Framework 

VISTA, developed by Professor Selberherr's group at 
the TUV, is the first working TCAD framework based 
on a standardized data format. 1 * 20 A brief descrip- 
tion of the VISTA system follows. 

The Back End A multilanguage programming 
interface, e.g., FORTRAN, C, or XLISP, permits access 
to the simulation database. Such an interface has a 
well-defined data format, which is a basic require- 
ment for tool-to-tool communication. The Program 
Interchange Format (PIF) data format in the VISTA 
environment provides a standardized way to trans- 
port simulation data, while at the same time- 
remains open to future demands and extensions 
through its highly flexible structure. 

The Front End The point-and-click interface of 
the TCAD shell, together with a visual program- 
ming interface, allows easy interaction for inex- 
perienced users. The user interaction surface can 
be customized to accommodate user and program 
requirements. Interactive development of complex 



process flow simulations is supported. The TCAD 
shell automatically performs implicit paralleliza- 
tion and job control; the shell is also capable 
of quick standard visualization. Interfaces to 
advanced visualization tools, such as DEC AVS, pro- 
vide state-of-the-art graphics. An on-line documen- 
tation system with automatic documentation 
generation from the source code always guarantees 
up-to-date information for users and programmers. 

The Tool Aspect Within the VISTA package, a 
generic toolbox enables common data manipula- 
tions such as interpolation, gradient calculation, 
and arithmetical operations. The toolbox covers 
most of the standard data manipulations occurring 
in process and device simulation and thus allows 
program developers to focus on the main issues of 
their tasks. A tool abstraction concept helps with 
the automatic generation of front and back inter- 
faces for new simulation tools. Strict rules ensure a 
consistent extension of the VISTA system to new 
and complex simulation tools. 

General Aspects To face future challenges, a TCAD 
environment requires a consistent architecture 
together with high-level concepts. Modern soft- 
ware development techniques, such as automatic 
code generation and documentation, layered struc- 
tures, and a high abstraction level of the underlying 
concepts, are the basis on which VISTA is built. 

Present Status of VISTA within Digital's Develop- 
ment Work The most recent version of VISTA has 
been installed at the CEC in Vienna. Because of the 
close proximity of the CEC and the TUV, feedback 
on the concepts and the implementation details 
will directly influence the future progress of VISTA. 
This test version permits the application of the 
common data interface, i.e., the PIF application 
layer, and the simple coupling of the VISTA-PROMIS 
and VISTA-MINIMOS simulators, both of which are 
based on PROMIS and MINIMOS but have been fur- 
ther developed for use within a TCAD framework. 
It is expected that during 1993, VISTA will replace 
parts of the intermediate solutions for Digitals 
needs. Integration of the existing tools such as VIDE 
into VISTA is in progress. 

Conclusions 

The use of process and device simulation tools pro- 
vides Digital s semiconductor process development 
teams with the following benefits: 
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■ Decreases the number of experiments required 
to optimize the fabrication process. Experimen- 
tal lots may take many months to process, 
whereas simulators can give results in minutes 
or hours. Simulation can never totally replace 
experimentation, however, because simulation 
is only a model of what we know about process 
and device physics, not reality itself. Smaller 
dimensions and new manufacturing processes 
require constant revision of our physical models 
and simulation tool capabilities. 

■ Allows the design teams to better estimate the 
spread of device performance, i.e., the so-called 
"worst case" conditions, before the process is 
well established. Thus, circuit designers can 
begin design earlier and with more confidence. 

■ Gives insight into the internal behavior of pro- 
cesses and devices to back up engineering judg- 
ment. For example, a layout design rule had been 
violated in an obscure way in the design of an 
I/O driver circuit. Redesign would have cost both 
time and space on the chip. Device simulation 
verified the device engineer's opinion that the 
violation would not cause any problem in actual 
practice. 
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CMOS-4 Technology for Fast Logic 
and Dense On-chip Memory 

Digital's fourth-generation CMOS technology has produced the industry's highest 
performance microprocessors. The NVAX and Alpha 21064 chips are based on 
0. 75'fim, 33-VCMOS technology capable of producing operating frequencies of up to 
100 MHz and 200 MHz respectively. The high-performance CMOS transistors consist 
of a 105- A gate oxide, symmetric n + and p + doped poly silicon for surface channel 
conduction, low threshold voltage, and good turn-off characteristics. The transistor 
has an on-wafer electrical gate length of 05 \xm, a shallow medium doped drain 
junction for hot electron immunity, a CoSi 2 salicided gate, and source and drain 
regions for low interconnect sheet resistance. A TiN/CoSi 2 local interconnect scheme 
was used to strap the drain and gate regions to form a six-transistor memory cell 
with an area equivalent to 100 fim 2 . 



High-performance complementary metal-oxide 
semiconductor (CMOS) microprocessor design 
began in Digital's Hudson, Massachusetts, site in the 
mid-1980s. Digital's strategy calls for the scaling of 
feature sizes with each new generation of CMOS 
technology, coupled with larger die size to achieve 
higher circuit density and higher system perfor- 
mance. CMOS CPU operating frequency has doubled 
every two years since 1986, from 10 megahertz 
(MHz) for the CVAX chip fabricated with the CMOS-1 
technology, to 100 MHz for complex instruction 
set computer (ClSC)-based architecture as demon- 
strated by the NVAX chip.' It reached 200 MHz in 
1991 for the Alpha 21064 reduced instruction set 
computer (RISC) architecture. 2 On-chip caches 
increased from 1 kilobyte (KB) for the CVAX chip 
in 1986 to a total of 16KB for the Alpha 21064 chip 
fabricated with CMOS-4 technology in 1991. 

The transistor count implemented on a single 
microprocessor chip beginning with CMOS-1 tech- 
nology swelled from 100,000 in 1986 to 1.7 million 
devices in 1991. During the same period of time, 
the polysilicon line that forms the transistor 
channel length was scaled from 2.0 microns (jum) 
to 0.75 /urn, and the gate oxide thickness was 
decreased from 300 angstroms (A) to 105 A for the 
CMOS-4 process. Concurrently, the power supply 
was scaled from 5 volts (V) to 3 3 V to reduce power 
dissipation and to provide extra protection against 
gate oxide wear-out mechanisms. 



High-performance microprocessors have posed 
a number of technological challenges in the design 
of transistors and on-chip caches. A number of 
process and device innovations have been intro- 
duced. For example, symmetric submicron n+ and 
p+ doped polysilicon transistors must provide 
low threshold voltage and good turn-off character- 
istics. Also, graded drain junctions must balance the 
driving current of the high-performance transistor 
with adequate hot carrier resistance. Self-aligned 
cobalt silicide (CoSi 9 ) polysilicon gate, and source 
and drain regions must provide low interconnect 
sheet resistance for improved circuit density and 
performance. 

On-chip, high-density, static random-access mem- 
ory (SRAM) is required for high-performance CMOS 
microprocessors. Previous technology scaling 
restricted the size of the on-chip cache due to the 
relatively large cell area. A variety of techniques 
were proposed to improve the RAM density, includ- 
ing the four-transistor (4T) cell with a second-layer 
polysilicon resistor and the six-transistor (6T) cell 
with buried contact (BC) or local interconnect 
(LI). The most attractive technique was the local 
interconnect scheme that was used to fabricate the 
100-jum 2 cell, the cell being used successfully in the 
Alpha 21064 and NVAX microprocessors. 

This paper describes the front-end process flow 
for Digital's fourth-generation 0.75 -/im CMOS tech- 
nology. It emphasizes the methods used to form 
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the low-resistance CoSi 2 and the titanium-nitride- 
based local interconnect used for the dense on-chip 
cache. The paper also discusses the CMOS transistor 
design and device characteristics. It concludes with 
a description of the SRAM cell. 

Process Description 

The front end of the CMOS-4 process is divided 
into six process modules: well formation, device 
isolation, gate formation, medium doped drain 




junction formation, CoSi 2 formation, and the local 
interconnect process. CMOS-4 technology was built 
upon the previous CMOS generations. Additional 
steps were incorporated into the process flow to 
meet new circuit design and layout requirements. 
Process steps were modified to accommodate scal- 
ing of device dimensions. A schematic process flow 
that depicts the various process modules is shown 
in Figure 1. The CMOS-4 layout design rules are 
given in brief in Table 1. 




(a) Well Drive 



a 



(b) Initial Oxide /nitride 
Active Area Mask 



T — 7 



(c) First Gate Oxide: Field Oxidation 




(d) Second Gate Oxide: Poly silicon 
Deposition, Polysilicon Mask 
and Etch, Wet Reoxidation 



(e) Spacer Oxide Deposition, 
Densification, and Etch 



J — ID) - 



(f) Source/drain Drive 



Q _ Q — 
1 1 1 Wto 1 m 



(g) Cobalt Silicide Formation 



(h) Titanium Nitride Deposition, 
Pattern, and Etch 



(i) Local Interconnect 



Figure I CMOS-4 Front -end Process Flow 
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Table 1 Layout Design Rules for CMOS-4 
Process 



Design Rule ' 


Dimension 


Minimum active area width 


1.5 [xm 


p+ to n + spacing 


3.0 jxrn 


n+/n+ orp+/p+ spacing 


0.75 fxm 


Polysilicon width/space 


0.75/0.75 fxm 


Metal contact to polysilicon 




or active area 


0.75 jxm 



Well Formation 

Digital's CMOS technology uses an n-well formation 
process. Starting silicon wafers, or substrates, are 
doped p-type. The starting wafer is composed of 
a thin, high-resistivity epitaxial layer on a low- 
resistivity substrate (p on p+). A thick oxide is ther- 
mally grown on the wafer. Well regions are opened 
in photoresist using a photomasking process. The 
oxide is removed in the well regions. The n-well is 
formed by phosphorous ion implantation (approxi- 
mately 10 13 atoms per square centimeter [cm 2 ]) 
into the open regions. A high-temperature diffu- 
sion step is performed to drive the n-well implant 
to a specified depth in the epitaxial layer. For the 
CMOS-4 process, the diffusion step was carefully 
adjusted to account for the closer spacing between 
p and n transistors and the thin epitaxial layer thick- 
ness (see Figure la). 

Epitaxial and well process requirements for 
Digital's CMOS-1 and CMOS-4 technologies are given 
in Table 2. The time required for well diffusion was 
shortened to 2.5 hours from 9 hours. Epitaxial 
thickness was reduced to 6.5 ixm to improve latch- 
up immunity. After the well diffusion step, all oxide 
is removed from the wafer before formation of the 
device isolation regions. 

Device Isolation 

Isolation regions between devices are formed using 
a conventional, semirecessed local oxidation of sili- 
con scheme referred to as LOCOS isolation. 3 After 
the pad oxide is grown and the silicon nitride is 



deposited, device areas (or active areas) are imaged 
in photoresist. Silicon nitride is removed from the 
isolation regions (or field regions) and retained on 
the active areas. In the substrate field regions, it is 
necessary to increase the concentration of p-type 
dopant to prevent unwanted current flow between 
devices. Boron is selectively implanted (approxi- 
mately 10 13 atoms per cm 2 ) into the substrate field 
regions and is blocked from entering the well 
regions by the photoresist layer. Note also that the 
nitride not covered with photoresist must be thick 
enough to block the implant (see Figure lb). Next 
a thick thermal oxide approximately 0.45 /.im is 
grown in the field regions. The nitride, however, 
prevents oxidation of the active area. 

The LOCOS isolation has been tailored to CMOS-4 
dimensions. Narrow-width transistors, routinely 
used in dense SRAjM layouts, are particularly sensi- 
tive to isolation process conditions. Pad oxide and 
silicon nitride thicknesses have been selected to 
minimize excessive lateral oxidation of the active 
area, which could cause a reduction in the physical 
transistor width and thus lower the saturation cur- 
rent. The oxidation temperature and time must be 
optimized to grow the desired field oxide thickness 
without negative impact on other device parame- 
ters. For CMOS-4, the lateral oxide encroachment 
was reduced to 0.25 ^m (per side). To minimize 
undesirable lateral diffusion of the field dopant, a 
relatively low temperature (950 degrees Celsius) is 
used for field oxidation (see Figure lc). 

After field oxidation, the nitride is chemically 
removed from the active areas, and the pad oxide is 
chemically stripped. The field oxide regions are 
semirecessed. Approximately 50 percent of the 
field oxide is below the silicon surface due to sili- 
con consumption during the oxide growth. 

Gate Formation 

To achieve the desired electrical and reliability 
behavior, microcontamination must be controlled 
during gate region formation. The gate oxide mate- 
rial must be free of any defects and contain minimal 
amounts of ionic and metallic impurities. Rigorous 



Table 2 Epitaxial and Well Diffusion Process for Two CMOS Technologies 







I Well Diffusion 


~1 


Technology 


Epitaxial Thickness 


Temperature 


Time 


CMOS-1 


11 pJll 


1 1 30 degrees Celsius 


9 hours 


CMOS-4 


6.5 fxm 


1100 degrees Celsius 


2.5 hours 
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chemical cleaning of the wafer is required before 
gate oxide growth. Gettering (collecting) of mobile 
ions is performed during gate oxidation by the use 
of chlorine species. The wafer is transferred from 
gate oxide to polysilicon deposition immediately to 
minimize exposure of the dielectric film to airborne 
contamination. 

Channel Doping A 450- A thermal oxide is grown 
on the active areas to condition or remove impuri- 
ties from the silicon surface. Often referred to as 
a "sacrificial" oxide, the thermal oxide is etched 
from the silicon surface before the real gate oxide 
is grown. Before the growth of the gate oxide, 
channel dopant for p- and n-type transistors is 
implanted through the sacrificial oxide. Photo- 
masking steps are done to selectively implant 
boron into n-channel regions and phosphorus into 
p-channel regions. For n-channel regions, two sepa- 
rate boron implants are done: a shallow implant 
(approximately 10 12 atoms per cm 2 at 20 kilo elec- 
tron volts [keV]) for threshold voltage adjustment 
and a deep implant (approximately 10° atoms per 
cm 2 , 110 keV) to guard against various punch- 
through mechanisms. 

Being a surface channel device, the p-channel 
region receives only a phosphorous threshold 
adjust implant (approximately 10 12 atoms per cm 2 
at 100 keV) since the well concentration is suffi- 
cient for proper subthreshold operation. 

Gate Oxide and Gate Electrode Next the sacri- 
ficial oxide is removed, and 105-A thin gate oxide is 
grown at 900 degrees Celsius. Then polycrystalline 
silicon of 3500 A thickness is deposited, patterned, 
and etched to define the CMOS n-channel and 
p-channel transistor gate electrodes (see Figure Id). 
The requirement for appropriate symmetric device 
design for the n- and p-channel devices is discussed 
in greater detail in the Symmetric Device Require- 
ment section of this paper. 

Medium Doped Drain Junction Formation 
Next, a thin, 170-A silicon dioxide layer is grown on 
the polysilicon and the active area regions. This 
layer is followed by a photoresist step that defines 
the n-channel device. Phosphorus is implanted 
(approximately 10 13 atoms per cm 2 at 25 keV) to 
form a shallow, 0.1-/xm, medium doped drain (MDD) 
junction for hot carrier protection 4 and good 
turn-off characteristics. The p-channel junction is 
formed next. A photomasking step defines the 



p-channel region, followed by a shallow boron 
dif luoride (BF 2 ) implant (approximately 10 15 atoms 
per cm 2 at 50 keV). 

Spacer Formation A 2000-A silicon dioxide layer 
is deposited and densified at 850 degrees Celsius. 
The oxide is reactive ion etched. This particular 
etch is anisotropic and leaves a vertical oxide side- 
wall 2000-A wide along the 3500-A thick poly- 
silicon lines. The MDD junction is located under 
the spacer oxide wall and extends 2000 A toward 
the drain contact region (see Figure le). The oxide 
spacer protects the MDD junction from receiving 
the heavy dose source and drain implants, and it 
protects the gate oxide during the CoSi 2 formation. 

Source and Drain Junction Formation After the 
spacer formation, the photoresist step that defines 
the n-channel transistor is repeated. An arsenic 
implant (5 X 10 1S atoms per cm 2 , 100 keV) forms the 
n+ source/drain junction. A final high-temperature 
drive is performed to anneal the implant damage 
and to drive the junctions for the n-channel and 
p-channel devices to a final depth of 0.22 /xm (see 
Figure If). 

SUPREM-based simulation^ of the CMOS-4 chan- 
nel, MDD, and source and drain doping profiles are 
shown in Figure 2a for the n-channel device and in 
Figure 2b for the p-channel device. The channel 
surface concentration for both devices is approxi- 
mately 1X10 17 atoms per cm 2 ; the MDD junction 
depth is 0.1 /tm; and the n + and p+ junction depths 
are both 0.22 /mi. 

Cobalt Silicide Module 

The CMOS-4 process technology has relied on the 
self-aligned CoSi 2 (salicide) to improve device per- 
formance by reducing the parasitic RC delay time 
associated with the gate and active area inter- 
connects. CoSi 2 also provides good ohmic charac- 
teristics for metal contacts to gates and active areas, 
and acts as an etch stop during metal contact for- 
mation. 6 The term "salicide" refers to the forma- 
tion of silicide on the gate and the source and drain 
region without the use of a masking layer. Because 
the silicide forms only in areas where the deposited 
metal can react with the exposed silicon surface, 
no silicide forms over silicon dioxide areas. 

Prior to CoSi 2 formation, a wet chemical clean 
removes any surface contamination, and a hydro- 
fluoric acid dip removes any residual silicon diox- 
ide (approximately 100 A). The wafers are then 
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Figure 2 SUPRBM Process Simulations 
for CMOS -4 Channel Devices 

introduced into a multistation, high-vacuum (down 
to 5X10" 8 torr) sputtering system where another 
100 A of silicon dioxide is etched. This is followed 
by a sputter deposition of approximately 200 A of 
pure cobalt film on the surface of the wafer. 

The initial high-resistivity phase of CoSi is 
formed using a rapid thermal annealer Each wafer 
is annealed at approximately 475 degrees Celsius 
for 90 seconds in nitrogen gas. Next the wafers are 
immersed in a selective etch, which is based on 
phosphoric acid to remove all the unreacted cobalt 
on the silicon dioxide without attacking the already 
formed CoSi or silicon dioxide. This 30-minute etch 



is self-limiting (once all the unreacted cobalt is 
removed, the reaction stops as the acid etches 
cobalt and nothing else). A one-minute 700 degrees 
Celsius anneal in nitrogen is then performed to 
form 700 A of CoSi 2 with sheet resistance at approx- 
imately 5 ohms per square. (See Figure lg.) 

Figure 3 shows a cross-sectional photomicro- 
graph of CoSi 2 film formed simultaneously over 
the polysilicon gate and the active area region. The 
CoSi 2 film thickness is approximately 700 A. The 
spacer oxide separates the silicide in the active area 
from the silicide on the polysilicon gate. 




Figure 3 Cross Section ofCMOS-4 Process 
Salicide CoSi 2 over Polysilicon 
and Active Area 

Local Interconnect Process 
The requirements for local interconnect material 7 
include low-resistivity film with good etch selectiv- 
ity to the underlying silicide film. Also the film must 
have good electrical contact resistance to both the 
silicided gates and silicided source and drain 
regions. The material selected was a laminate of 
200 A of titanium and 600 A of low-resistivity 
titanium nitride (TiN). Titanium reduces the con- 
tact resistance through a chemical reaction with 
any surface oxides over the CoSi 2 to form a good 
ohmic contact. The titanium nitride has a higher 
resistivity, but is more chemically stable and does 
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not oxidize as readily as titanium during subse- 
quent processing steps, such as oxygen plasma 
strips. Titanium nitride is also much easier to pat- 
tern with photoresist for the subsequent reactive 
ion etch step. 

Local Interconnect Deposition The local inter- 
connect is deposited using the same high- vacuum 
sputtering system used for cobalt deposition. The 
titanium and the titanium nitride are sequentially 
deposited in the same chamber. Nitrogen is reacted 
with the titanium on the target surface to form tita- 
nium nitride. Subsequently the titanium nitride is 
sputter deposited onto the wafer. Varying the nitro- 
gen flow can control the chemical and electrical 
properties of the titanium nitride film. The deposi- 
tion process is adjusted to produce TiN with a resis- 
tivity of 50 /Ltohm X cm. 

Local Interconnect Etch Local interconnect 
etch consists of patterning TiN interconnects 
between source/drain and unrelated polysilicon 
(see Figure lh). At this layer, there are two main 
concerns: (1) complete removal of TiN residue 
(stringers), and (2) etch selectivity to the other 
exposed materials on the wafer. 

At this point in the fabrication process, the TiN 
is 4.5 times thicker at the side of an oxide spacer 
than over a flat region on the wafer. Continued 
etching to remove the TiN residue by the spacer 
would result in unacceptable overetching in the flat 
regions. Due to the conformality of the TiN to the 
spacer, the width of the TiN at the side of the spacer 
is the same as the thickness of the TiN in the flat 
regions. By controlling the ratio of the anisotropic 
etch component to the isotropic etch component, 
a portion of the TiN residue can be removed later- 
ally, which significantly reduces the amount of 
overetch required. This combination of vertical 
and lateral etching is accomplished by using a 
chlorine/trif luoromethane (CI 2 /CHI ; 3 ) plasma chem- 
istry at low pressure. 

During the TiN etch, four other materials are 
exposed to the reactive plasma: silicon oxide, 
CoSi>, photoresist, and silicon. To minimize mate- 
rial loss, the TiN etch rate must be optimized to 
significantly exceed the etch rates of these other 
materials. This is accomplished by using response 
surface methodology. 

The optimized etch process consists of a two- 
step etch. The first step has an anisotropic char- 
acteristic. The pressure is 25 millitorr (mtorr); 



bias voltage is —220 V; G 2 is 20 standard cubic cen- 
timeters per minute (seem); CHF 3 is 30 seem; and 
boron trichloride (BCl 3 ) is 120 seem. The second 
step is a more isotropic overetch step to remove 
residual TiN. It consists of a 25-mtorr pressure and 
a — 94-V bias voltage. The Cl 2 is 60 seem; CIIF 3 is 
40 seem; and BC1 3 is 90 seem. The temperature of 
the cathode is maintained at 50 degrees Celsius. 
The photoresist is stripped using a three-step pro- 
cess. The first step is a wet solvent strip that 
removes any soluble residues and plasma-hardened 
resist from the wafer. The second step is an oxygen 
plasma strip using a single-wafer stripper. The final 
step is another wet solvent strip (see Figure 1 i) . 

Figure 4 is a photomicrograph of the top view of 
the active area and polysilicon regions. It shows the 
thin layer of TiN local interconnect strap that short- 
circuits the two layers together. 



Figure 4 Photomicrograph Showing the Top 
View of the Active Area, Polysilicon, 
and TiN Local Interconnect 

Transistor Design Considerations 

Figure 5 shows a cross section of a CMOS-4 transis- 
tor. It highlights the polysilicon gate region, spacer 
region, CoSi 2 region, and the metal contact regions 
filled with TiN and tungsten films. 

Unless adequately designed, submicron CMOS 
devices suffer many undesirable electrical effects. 
These effects are related to the scaling of the tran- 
sistor channel length and the gate oxide thickness. 
Scaling the effective channel length for both n- and 
p-channel devices requires a reduced junction 
depth and an increased channel surface concentra- 
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Figure 5 Cross Section of CMOS-4 
0, 75-ixm Transistor 

tion aimed at improving the short channel effects. 
These solutions may also contribute to an increase 
in the built-in electric field, an increase in the 
impact ionization, an increase in the substrate cur- 
rent, and a decrease in the punch-through voltage. 

Arsenic doping profiles are usually used to fabri- 
cate shallow junctions in n-channel devices. They 
produce a high electric field that can cause a high 
rate of impact ionization between the electrons 
injected from the source and the fixed ions in the 
depletion region of the drain junction. On impact, 
electron-hole pairs are generated. The holes are 
swept to the source or substrate region and are 
known as the substrate current. The electrons sub- 
jected to the high drain electric Held can gain 
enough energy to inject in the gate oxide region 
above the drain junction. These "hot electrons" may 
create interface states or may lose enough energy 
and be trapped in some defect location. Long-term 
effects of the trapping mechanism give rise to trans- 
conductance and saturation current degradation 
which eventual ly lead to circuit failure. 

N-channel Junction Formation 
To protect against the reliability hazards associated 
with arsenic profiles, a graded junction was imple- 
mented. Heavy emphasis was placed on the SUPREM 
process simulator and the MINIMOS device simu- 
lator. 8 The use of these simulation tools allowed us 
to accurately predict the device behavior under cer- 
tain electrical conditions, and ensured that the 
device characteristics were optimized for high per- 
formance and superior reliability Use of these sim- 
ulators also reduced the dependence on wafer 
usage for process optimization. 



The two-dimensional device optimization 
resulted in the following graded junction process 
parameters: a phosphorous dose of 7 X 10 13 atoms 
per cm 2 at 25 kev, diffused to a 0.1 -/u,m junction 
depth with a spacer width of 0.15 to 0.2 ixm. 
The phosphorous surface concentration was set 
to approximately 10 19 atoms per cm 3 to reduce the 
source and drain series resistance, R r and to accom- 
plish the highest possible saturation current, l dsaV 
while maintaining low substrate current for 
improved hot carrier reliability. The graded phos- 
phorous junction was called medium doped drain 
(MDD) to refer to the relatively high doping concen- 
tration (1X10 19 atoms per cm 3 ). In contrast, the 
lightly doped drain (LDD) junction, with doping in 
the 10 17 to 10 18 atoms per cm 3 range, suffers from 
large R t and low driving current capability. 

By implementing the MDD process, we were able 
to accomplish the following n-channel device char- 
acteristics: (1) minimum device lifetime of 20 years 
at a drain voltage bias (V^) of 4.3 V, (2) source and 
drain series resistance of 0.05 ohm X cm, (3) driving 
current capability I (fsal of 0.385 milliampere (niA) 
per /u.m, and (4) punch-through voltage (BVDSS) 
above 7 V. See Table 3- 



Table 3 Electrical Parameters for CMOS-4 
Transistors 





N-channel 


P-channel 


MDD Xj 


0.1 |xm 


NA 


P+Xj 


NA 


0.18 fxm 


X W ell 


NA 


1.75 pjn 




105 A 


105 A 




0.5 V 


-0.5 V 


<-eff 


0.5 (.on 


0.5 |xm 


Delta L 


0.25 n,m 


0.25 fjm 


Delta W 


0.55 p.m 


0.65 |i,m 


^dsat 


0.385 mA per p.m 


-0.1 67 mA per 


BVDSS 


>7 V 


>-7 V 



Figure 6 shows the results of two-dimensional 
MINIMOS simulation of the lateral electric field dis- 
tribution in the CMOS-4 n-channel device with MDD 
junction profiles. The electric field is plotted along 
the transistor channel region starting from the 
source toward the drain region. The device has a 
drawn poly silicon length of 0.75 /*rn and a gate 
oxide thickness of 105 A. The drain voltage biases, 
v cts> were set to 4-3 V and 3.3 V, and the gate voltage, 
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sistors are equivalent. 9 Furthermore, the n-channel 
has an n-type doped polysilicon formed during the 
£ n+ source/drain junction and the p-channel has a 

> p-type doped polysilicon formed during the p + 

\ I'* source/drain junction. The technique of symmetri- 

cally designing the devices allows the threshold 
voltages to be equal in magnitude, and suitably low 
for high driving current capability while maintain- 
ing good punch-through characteristics. 

Figure 7 shows the results of the MINIMOS simu- 
lation of the potential distribution in CMOS-4 tran- 
sistors with p-type doped polysilicon gate. The 
bias points were set for V equals 0 V and V ds equals 
-0.4 ^oi o 02 04 06 as i~o — 3.3 V. The channel length was 0.75 /xm, and the 
length (pm) gateoxide was 105 A. Superior punch- through char- 

key: acteristics are observed since the potential contour 

~~ A ~" v ds = 3.3V lines do not spread significantly toward the source. 

-e- V ds = 4.3V 



Figure 6 N-channel MINIMOS Simulation 

of Lateral Field for Medium Doped 
Drain Junctions 

V gs , was set to 2.2 V. Notice that the peak field for 
33-V operation is approximately 0.3 X 10 6 V per cm 
compared to the worst-case field condition of 
0.4 X 10 6 V per cm, where ¥ (H equals 4.3 V. 

P-channel Junction Formation 
The p-channel device requires special care in the 
design of the p + source and drain junction depth to 
ensure that boron does not penetrate the thin gate 
oxide and enter the n-well region. The p+ junction 
dose, junction depth, and temperature cycle were 
optimized to accomplish a low R r by using BF 2 
(1 X 10 15 atoms per cm 2 and an energy of 50 keV). In 
addition, the diffusion time was minimized, and 
good threshold voltage control and punch-through 
protection were maintained. The boron distribu- 
tion in polysilicon obtained with extensive sec- 
ondary ion mass spectroscopy (SIMS) analysis, as 
well as negative bias and temperature instability 
(NBTI) tests, ruled out any boron penetration. 

Symmetric Device Requirement 
High-performance submicron CMOS technologies 
require the simultaneous optimization of the 
n-channel and p-channel devices for high driving 
current capability and excellent short-channel 
device characteristics. This is best accomplished 
with symmetric design where channel doping, 
junction depth, and threshold voltage of both tran- 




0.8 ■ 

i.o L . , , . , , . — l_ 

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 
LENGTH (pm) 

KEY: 

-3.45 -2.16 -0.877 

-3.02 -1.73 -0.449 

-2.59 -1.31 0.02 

Figure 7 P-channel MINIMOS Simulation 
of Potential Distribution 

Device Characteristics 

The extrapolated threshold voltage for an n-channel 
device plotted as a function of gate length is shown 
in Figure 8. Excellent threshold voltage control 
is shown for channel length down to 0.5 ^m. The 
n-channel drain current, 7^, is plotted in Figure 9 as 
a function of drain voltage, V^, while V is varied 
from 0 to 5 V with 0.5-V steps. In Figure 10, the drain 
current is plotted on a logarithmic scale as a func- 
tion of gate voltage to highlight the subthreshold 
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Figure 8 N-channel Threshold Voltage Plotted 
as a Function of Effective Channel 
Length 
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Figure 9 N-channel Drain Current Plotted 
as a Function of Drain Voltage 

slope behavior for V eJs of 0.1 V and 3.6 V. The sub- 
threshold slope was measured to be 86 mv per 
decade and is characterized by good drain-induced, 
barrier-lowering characteristics. The drawn dimen- 
sions of the transistor are 12.5 /xm wide by 0.75 /xm 
long. 

Similar characteristics are observed for the 
p-channel device and are shown in Figures 11, 12, 
and 13. The subthreshold current conduction and 
punch-through characteristics are very similar to 
those of the n-channel device. 



Figure 10 N-channel Drain Current Plotted 
as a Function of Gate Voltage 

Table 3 shows typical CMOS-4 transistor process 
and device parameters. The junction depths, X Jy 
and the n-well depth, X weW are simulated with the 
SUPREM process simulator and verified with SIMS 
analysis. T ox is the physical gate oxide thickness; 
is the extracted threshold voltage; L cff is the nomi- 
nal final channel length, and delta L and delta W are 
electrically extracted using the Terada method, 
which accounts for the parasitic series resistance. 
I clsat is the saturation current measured with the 
drain and gate voltage at 3.3 V. BVDSS is the punch- 
through voltage measured with ^.set at 0 V. 

Silicided Interconnects Characteristics 
Table 4 shows the effects of the sal icide process on 
the parasitic resistance in four consecutive tech- 
nologies. The CMOS-1 process uses no silicided gate 
or drain and therefore is expected to have a high 
interconnect sheet and contact resistance. CMOS-2, 
on the other hand, uses a low sheet tungsten sili- 
cided (WSi 2 ) polysilicon gate with a sheet resistance 
of 3 ohms per square. The CMOS-3 and CMOS-4 tech- 
nologies use salicided low sheet resistance CoSi 2 
for both the polysilicon gate and the source/drain 
region with a sheet resistance of 5 ohms per square. 

SRAM Implementation 

A six-transistor (6T) cell was selected for its process 
simplicity and cell stability. To provide a dense, 
cost-effective SRAM capability, the 6T cell was 
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Figure 11 P-channel Threshold Voltage Plotted 
as a Function of Effective 
Channel Length 
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Figure 12 P-channel Drain Current Plotted 
as a Function of Dram Voltage 

chosen over the 4T cell, which requires complex, 
two-level polysilicon films. 

During the initial 6T cell process development, 
the TiN local interconnect scheme was considered 
advantageous to the buried contact scheme. In the 
buried contact procedure, the gate oxide is pat- 
terned and etched, and then a polysilicon gate is 
deposited to provide the contact between poly- 
silicon and the active area. This technique allows 
the polysilicon him to access the source/drain 



Figure 13 P-channel Drain Current Plotted 
as a Function of Gate Voltmge 

region without the need for area-consuming metal 
contact. Unfortunately, this technique is not readily 
compatible with symmetric n+ and p+ doped poly- 
silicon structures. In addition, silicon grooves 
might form during polysilicon etch, which could 
jeopardize the junction integrity and cause leakage 
or short circuits to the silicon substrate. 

The preferred method to access the source/drain 
region was the use of TiN strap over CoSi ? . TiN local 
interconnect is a conductive material. When it is 
sputter deposited on the wafer, pattern and etch 
can be used to strap the node of one transistor to 
the gate or drain of another transistor. Also, the TiN 
local interconnect provides excellent etch selectiv- 
ity to the underlying CoSi ? material. The TiN local 
interconnect process proved superior to the buried 
contact scheme because the improved etch selec- 
tivity to CoSi, prevents junction leakage. 

In standard layout techniques, the metal I con- 
tact is spaced 0.75 i±m from the edge of the 
polysilicon gate and the isolation. This spacing 
results in a 2.25-^m wide active area, as shown in 
Figure 14a. In contrast, the local interconnect tech- 
nique does not require a contact region; therefore 
the active area width can be scaled to 1.5 /xm, as 
shown in Figure 14b. The use of local interconnect 
has reduced the 61 cell area from 120 fjum 2 (no LI) 
to 100 jLim 2 (with LI). In addition, the use of LI 
has improved yield due to a relaxed metal 1 contact 
requirement and metal spacing. 
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Table 4 Sheet Resistances for CMOS Technologies (Ohms per Square) 





CMOS-1 


CMOS-2 


CMOS-3 


CMOS-4 




N+/P+ 


N+/P+ 


N+/P+ 


N+/P + 


Source/drain sheet resistance 


40/75 


40/75 


5 


5 


Polysilicon sheet resistance 


40/NA 


3 


5 


5 


Local interconnect 


NA 


NA 


NA 


6 
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ACTIVE AREA 




METAL CONTACT 



(a) Standard Layout 

LOCAL INTERCONNECT 

ACTIVE AREA 



0.75 



1.5 pm 
I 



(b) Local Interconnect Layout 

Figure 14 Layout Schematics Comparing 
Metal Contact with Local 
Interconnect 

Figure 15a is a photomicrograph of a CMOS-4 6T 
SRAM cell taken after LI etch and photoresist strip. 
It highlights the active area regions covered with 
CoSi 2 and TiN LI straps. Figure 15b shows a layout of 
the SRAM cell used in the Alpha 21064 micro- 
processor. Transistors Tl, T2, etc., are highlighted 
in Figures 15 and 16 to simplify their identification. 
The cell area is 6.75 by 14.8 ^m 2 (100 /nn 2 ). The cell 
uses only three metal 1 contacts (V DD and V^) to 



active area regions, compared to eight contacts in 
the cell with no LI. The minimum transistor width 
in the cell is 1.5 ^m. 

Summary 

CMOS-4 technology for the Alpha 21064 and the 
NVAX microprocessors was discussed in detail. 
Process and device features for fast logic and 
dense on-chip SRAM were presented. The high- 
performance transistor requires the simultaneous 
optimization of the drain junction for hot carrier 
resistance and for high driving current capability. 
Low-resistance silicided interconnect uses a robust 
CoSi 2 process. On-chip SRAM based on a 6t cell 
with TiN local interconnect provides high-density 
and high-yield products. 
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CMOS-4 Back-end Process 
Development for a VLSI 0.75- pm 
Triple-level Interconnection 
Technology 

Digital's CMOS-4 on-chip interconnect technology, developed for and used in pro- 
duction of the NVAX and the Alpha 21064 microprocessor chips, is a three-level alu- 
minum alloy metallization process, with planarized TEOS-based silicon dioxide 
dielectrics, tungsten-filled contacts and vias, and a minimum feature size of 
9.75 fim. The process development effort was a twofold approach based on the maxi- 
mum use of existing manufacturing capability and the introduction of required 
new process features. For photolithography, plasma etch, and PVD metallization, 
the ].0-fxm manufacturing equipment set and processes were modified and reopti- 
mizedfor the submicron regime. In addition, two new process features, a blanket 
CVD tungsten process and a TEOS-based oxide planarization process, were developed 
and implemented in manufacturing to meet the CMOS-4 technology requirements. 



Each generation of Digital's complementary metal- 
oxide semiconductor (CMOS) very large-scale inte- 
gration (VLSI) microprocessor development has the 
goal of providing a 30 percent net incremental per- 
formance improvement and a twofold area density 
improvement from the previous technology. This 
logic design need for higher density and improved 
performance places a considerable demand on 
ultra-large-scale integration (ULSI) circuitry to pro- 
vide processes that permit a scaling of horizontal 
geometries with vertical film thicknesses remaining 
constant. 1 

The technology goals for fourth-generation CMOS 
(CMOS-4) were met by providing a 25 percent algo- 
rithmic reduction of horizontal feature size from 
1.0 micron (/u,m) to 0.75 /^m, accompanied by mini- 
mal or no reduction in back-end interconnect or 
dielectric thicknesses. The process materials and 
critical parameters are described in Table 1. The 
small spatial resolution required vertical-walled 
vias to access smaller-pitched metal layers effi- 
ciently; Interconnect reliability was maintained 
through implementation of a tungsten-filled via- 
plug to improve current spreading and to maintain 



metal step coverage into contacts. 2 The CMOS-4 
interconnect structure is shown in Figure 1. 

The CMOS-4 process was developed in a manufac- 
turing fabrication clean room originally configured 
for the preceding 1.0-/xm CMOS-3 technology. The 
goal of the Advanced Semiconductor Development 
(ASD) and Manufacturing Engineering Groups was 
to introduce as few process changes and new 
pieces of equipment as possible. For two of the pro- 
cesses, joint development efforts at equipment ven- 
dor sites were conducted to develop hardware and 
assess process feasibility. The equipment was pur- 
chased and installed in the manufacturing clean 
room with final process characterization and inte- 
gration performed at Digital's Hudson facility. 

This paper discusses how the existing tools 
were modified for use in the CMOS-4 process in 
the areas of photolithography, plasma etch, and 
physical vapor deposition (PVD) metallization. It 
describes the addition of blanket tungsten and 
plasma-enhanced tetraethylorthosilicate (PE-TEOS) 
oxide processes that were developed in clustered, 
multichamber tools and optimized to meet the sub- 
micron-level requirements of CMOS-4 technology. 
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Table 1 CMOS-4 Film Types, Thicknesses, and Critical Dimensions 



Process Level 


Material 


Final Thickness Target 


Critical Dimensions 


Dielectric 1 


Phosphorus-doped PE-TEOS 
and boron oxide 


7500 A 




Metal 1 contact 


Ti/TiN and W plug size 


Plug recess <3000 A 


0.75 by 0.75 i±rc\ contact 


Metal 1 


Al:1 %Cu/TiN cap 


7500 A 


1.50/0.75 /xm line/space 


Dielectric 2 


PE-TEOS and SOG 


7500 A 




Metal 2 contact 


Ti/TiN and W plug 


Plug recess <3000 A 


0.75 by 0.75 /tm via size 


Metal 2 


AI:1%Cu/TiNcap 


7500 A 


1 .88/0.75 i±m line/space 


Dielectric 3 


PE-TEOS and SOG 


18,000 A 




Metal 3 contact 


Al:1 %Cu 


>0.3 fxm 


3.0 by 3.0 /Ltm via size 


Metal 3 


TiN/AI:1%Cu/TiN cap 


20,000 A 


4.5/3.0 tim line/space 


Passivation 


PE-TEOS 


7500 A 






1 FIELD OXIDE 5 METAL 1 

2 POLYSILICON 6 DIELECTRIC 2 

3 DIELECTRIC 1 7 TUNGSTEN 2 
4a TUNGSTEN 1 PLUG 8 METAL 2 

TO POLYSILICON 9 DIELECTRIC 3 

4b TUNGSTEN 1 PLUG 10 METAL 3 

TO SOURCE/DRAIN 1 1 PASSIVATION 



I : i»ure 1 Cross-sectional Photomicrograph and Schematic ofCMOS-4 Interconnect Structure 



Modification of Back-end Processes 

The details of development efforts and their 
resolution for photolithography, plasma etch, and 
PVI) metallization are discussed in the following 
sections. 

Photolithography 

The OMOS-4 photolithography process uses single- 
layer photoresists, reduction steppers with an 
exposing wavelength of 436 nanometers (nm) and 
numeric aperture lenses of .45 NA and .54NA, and 
spray/puddle develop. A photoresist thickness of 
1.2 /i.m is used at contact levels to enhance resolu- 



tion. The photoresist thickness used at metal 1 and 
metal 2 is 2.0 ^.m in order to prevent total photo- 
resist erosion from the higher steps during the etch 
process. Metal 3 contact and metal 3 use a thicker, 
3.5 -{Jim photoresist to accommodate tapered oxide 
etching, extreme topography, and high loss of photo- 
resist during etch. In the manufacturing line, mul- 
tiple track and stepper combinations are allowed. 
Critical dimension control is maintained by run- 
ning fixed exposures and monitoring process E n 
once a shift for the possible coat-and-expose equip- 
ment sequences. Overlay control of ±20 fxm is 
achieved by running a lot pilot wafer and using 
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alignment offsets to center the lot distribution at 
zero. 

The CMOS-4 photolithography process was trans- 
ferred into the existing CMOS-3 production line 
without any modifications to existing equipment. 
However, the introduction of back-end processes 
did present resolution problems at the metal 2 and 
contact layers and poor overlay performance at all 
metal layers. 

Resolution The challenge for many of the upper 
levels was to routinely resolve 0.75-auti geometries. 
For contact and via levels, part of the solution was 
to use thinner photoresist, overexposure, and only 
the higher NA steppers. Initially, material processed 
at these levels periodically exhibited distorted or 
missing contacts. Experimentation with focus/ 
exposure matrices showed no further improve- 
ment with overexposure. Also, slight focus shifts 
were enough to considerably distort the contact 
pattern. A 0.5-/>tm focus offset provided the neces- 
sary latitude to eliminate resolution problems. 

Metal 2 Processing Poor pattern definition caused 
both bridged photoresist lines over topography and 
rounding of the tips of lines. The photoresist could 
not be thinned below 2.0 /urn due to the high loss of 
photoresist during etch. Elimination of dyed photo- 
resist was an option because titanium nitride 
(TiN) antireflective coating (ARC) was used below 
the photoresist layer. Undyed photoresist elimi- 
nated the bridging problem. A focus shift similar to 
that used at the contact levels was necessary to best 
resolve the tips of the lines. Factorially designed 
experiments indicated that higher exposure and 
further defocus would help minimize metal short 
circuits. The increased exposure widened the gap 
between potentially short-circuited lines, and the 
defocus increased the degree of proximity effect, 3 
leaving the dense lines significantly smaller than 
the isolated 1 ines. The 0.5 /i.m focus offset was again 
selected as optimum. It decreased the photo- 
lithography contribution to metal short circuits yet 
still produced line widths that met design guide 
criteria. 

Alignment Maximum misalignment tolerance is 
dictated by the reliability requirement to ensure 
100 percent contact coverage. The alignment toler- 
ance calculation was based on maximum allowable 
contact size and minimum metal line widths. For 
example: 



Maximum metal 2 contact 0.95 fim 
Minimum metal 2 line width = 1.55 /um 
Misalignment tolerance = (1.55 - 0.95) /2 0.30 ^m 

Similar calculations performed at metal 1 contact 
and metal 3 contact indicated similar tolerances 
were needed. However, signal-to-noise ratios with 
the alignment systems were extremely low and 
overly sensitive to minor process fluctuations, such 
as changes in film thickness, film reflectivity, sur- 
face roughness, extent of planarization, and grain 
boundary highlighting. Initial attempts indicated 
alignment, when possible, had an average value 
above 0.40 /im for a lot. New alignment systems 
specifically designed for metal and planarized 
dielectric were considered. However, the decision 
was made to develop a new processing technique 
on the existing equipment. 

A logical approach to the problem was to elimi- 
nate the metal from the alignment target areas. 
All efforts at optimizing target size still rendered 
only a marginally acceptable process. "Cutout'' pro- 
cessing at metal 3 was used from the beginning of 
CMOS-4 development and was gradually introduced 
to the other metal layers as wafer lot volume 
increased and problems with alignment increased 
cycle times and scrap rates. 

The cutout process involves running an extra 
masking step prior to metal alignment, which 
exposes the underlying alignment targets. Since the 
cutout openings are large, misalignment tolerance 
is on the order of microns and is achievable by align- 
ing the cutout mask in a manual global alignment 
mode followed by a blind step sequence. Wafers are 
subsequently either dry or wet etched to remove 
the TiN and aluminum (Al), then returned to photo- 
lithography for standard metal alignment process- 
ing. Overlay results using the cutout at metal 2 and 
metal 3 average approximately 0.20 /xm (±3 a). 
Metal 1 performance is approximately 0.15 /urn 
(±3 a) by aligning to the still visible active area 
marks. 

Plasma Etch/ Strip Processes 
The CMOS-4 back-end oxide etches, metal etches, 
and photoresist strip processes were developed on 
the existing manufacturing equipment. A straight- 
walled contact process for metal 1 contact (MIC) 
and metal 2 contact (M2C) needed to be developed 
for incorporation with the tungsten plug technol- 
ogy. Additionally, the 0.75 -^m wide contacts with 
straight sidewalls (greater than 85 degrees) and 
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aggressive aspect ratios (height/width), required 
optimization of the photoresist strip processes. The 
metal 3 contact (M3C) tapered etch process was 
also redeveloped to meet CMOS-4 electromigration 
reliability requirements. 

In addition to a straight-walled contact process 
for use with tungsten plugs, excellent selectivity to 
underlying materials was required to compensate 
for the increased planarity of the CMOS-4 dielec- 
trics. For example, the planarity of the dielectric 
between polysilicon and metal 1 meant that the con- 
tacts to a polysilicon gate would be etched through 
the thinnest dielectric (approximately 0.75 /xm), 
whereas the contacts to active area regions would 
need to be etched over a much thicker dielectric. 
The worst-case difference is approximately two 
times the thickness of the dielectric above the poly- 
silicon layer, or approximately 1.5 /im, as shown 
in Figure 2. This difference in dielectric film thick- 
ness meant that the cobalt silicide (CoSi 9 ) film 
over the polysilicon contact would be overetched 
by approximately 100 percent during the MIC etch 
process. Therefore, the MIC etch process needed 
a very high differential etch rate or selectivity 
between the etch rate of the oxide as compared to 
the etch rate of the CoSL. ( 
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figure 2 Schematic Draiving Shelving 

the Difference in Step Height for 
Melal 1 Contact to Polysilicon 
and Source/Drain Regions 

Straight-ivallcd Contact Process The straight- 
walled contact process development was accom- 
plished by experimenting with bias voltage, 
trifluoromethane:oxygen (CHF ,:0 9 ) gas ratio, and 
pressure. 5 The substrate bias (to control photo- 
resist, oxide, and CoSi, etch rates) and the ratio of 
CIU ,:Q ? flows (to control side wall profile and photo- 



resist pullback) were determined to be the most 
critical parameters. The optimized process resulted 
in a high-throughput, uniform, 0.75-jnm straight- 
walled contact process with an oxide-to-cobalt sili- 
cide selectivity of greater than 25:1. The underlying 
material at M2C etch is an aluminum (Al) alloy. 
Because of a very high selectivity of oxide-to-Al 
etch rates, an overetch of up to 100 percent at M2C 
etch did not impact the contact profile or the con- 
tact resistance. 

Optimized Photoresist Strip Process The opti- 
mized M1C/M2C straight-walled etch process 
altered the amount of sidewall polymer formed 
in the contacts during the contact etch process. 6 In 
addition, the change in aspect ratio affected the 
ability of wet chemicals to remove all the photo- 
resist and polymer during photoresist strip process- 
ing. It was shown empirically that residual poly- 
mer remaining in contacts after photoresist strip 
impacted contact resistance. The contact photo- 
resist strip process was modified from a two-step 
dry/wet process to a three-step wet/dry/wet pro- 
cess. The first wet strip was required to pre-wet 
the polymer/photoresist in the contacts. The bulk 
of the photoresist was then removed in an oxygen 
downstream plasma stripper (dry) and was fol- 
lowed with a final wet strip to remove any residues. 
Beginning with a wet strip cycle also improves side- 
wall polymer removal for metal etch processes. 
Consequently, all the CMOS-4 back-end resist strip 
processes follow a wet/dry/wet strip process flow. 

Metal 3 Contact Etch Process Initially, the M3C 
tapered etch process did not consistently meet the 
CMOS-4 electromigration minimum step-coverage 
requirement of 0.30 /xm of metal on M3C sidewalls. 
A unique set of problems existed at M3C etch. The 
nonuniformity in the underlying topography 
caused the photoresist coat to be thinned over iso- 
lated metal lines. This thin coat led to early photo- 
resist breakthrough and subsequent dielectric 
erosion during M3C etch. A thicker photoresist coat 
led to steeper sidewalls, which resulted in a degra- 
dation of metal step coverage. 

An experimental design was used to optimize the 
photoresist and etch processes in unison. 7 The 
photoresist thickness was increased to 34 /xm, and 
a focus offset was implemented during exposure to 
slope the photoresist profile prior to etch. The opti- 
mized photolithography process resulted in a pre- 
etch photoresist profile of 80 degrees ± 3 degrees. 
A multiple-step, tapered etch process was devel- 
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oped in which step 1 etched 40 percent of the con- 
tact depth anisotropically to maintain final contact 
critical dimension control. Steps 2 and 4 are photo- 
resist pullback steps. These critical etch steps were 
optimized by running a series of designed experi- 
ments to characterize the responses of photoresist 
etch rate, uniformity, and lateral-to-vertical erosion 
rates. Etch rates were determined using patterned 
oxide test wafers which were cross-sectioned and 
analyzed using the scanning electron microscope 
(SEM). Substrate bias and oxygen flow were the 
primary parameters controlling lateral-to-vertical 
photoresist erosion rates. Steps 3 and 5 transfer the 
tapered photoresist profiles into the dielectric film. 



Figure 3 depicts this progression. Step 6 is a final 
clean-up step. The optimized M3C process demon- 
strates a post-etch contact slope of 65 degrees 
±5 degrees, which consistently results in metal step 
coverage exceeding the 0.30-^m requirements. 
Measurements obtained by SEM are compared in 
Figure 4. 

Physical Vapor Deposition Metallization 
The PVD metallization processes are performed in a 
single-wafer, multichamber, high-vacuum sputter 
deposition system. The chambers include a radio fre- 
quency (RF) etch, titanium, cobalt, and aluminum- 
copper cathodes. The RF etch provides a sputter 
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Figure 3 Schematic Drawing Showing Metal 3 Contact Tapered Process 
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Figure 4 Comparison oJ Melcil 3 Step Coverage with Standard and Optimized Contact Tapers 



cleaning of the wafer to remove any native oxide 
from the wafer surface before the film is deposited. 
The titanium target is used to reactivity form the 
TiN film that is used for local interconnect, tungsten 
adhesion layers, antirefiective coatings, and a fuse 
link. The cobalt target is used for silicide formation 
on gate and source/drain regions. The A 1: 1 %Cu alloy 
is used for three levels of on-chip low-resistance 
interconnect. 

The reactive sputtering of the TiN film has pro- 
vided a number of challenges in the area of particle 
control. Early modeling in the CMOS-4 development 
cycle using the CMOS-3 yield model predicted the 
impact of the TiN particle levels on the reduced 
metal pitch areas of the CMOS-4 process. The model 
indicated that the number of defects per 100 meters 
of interconnect would have to be reduced from the 
level of 20 defects added. To prevent the TiN film 
and associated interconnect from being among the 
top yield limiters, a level of less than 5 defects had 
to be attained. 



To lower the number of defects, Digital process 
engineers worked with the equipment vendor to 
design and develop a new cathode. The CMOS-3 cath- 
ode used for both the TiN and Al films was a mag- 
netron configuration, designed to enhance the 
deposition rate of the target material by creating 
additional bombarding ions. The magnetron has a 
fixed set of magnets oriented to confine the ioniz- 
ing electrons and thus cause the target to erode in 
a racetrack pattern. This confinement results in 
re-deposition on areas of the target that are not 
sputtered. These areas become a major particle 
source. The new design is based on a set of rotating 
magnets that move the erosion pattern over the 
entire surface of the target and keep the surface 
free of any material build-up. Because of this 
enhanced sputter uniformity, the rotating configu- 
ration maintains a low particle count throughout 
the life of the target. The conventional magnetron 
and rotating magnetron defect density levels are 
compared in Figure 5. 8 
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Figure 5 Cathode Configuration as Compared 
to De feet Density 

Titanium-nitride Film The TiN process required 
re-optimization to characterize the rotating cath- 
ode and accommodate the CMOS-4 application of 
TiN as a tungsten plug adhesive layer. In addition 
to lowered particle levels, the process focused on 
achieving low contact resistance between the vari- 
ous metallurgical interfaces. Screening studies 
were performed to characterize adhesion layer 
properties for MIC and M2C via plugs. While minor 
effects on via resistance were observed for some of 
the factors, the single most important factor was 
the presence or absence of a titanium underlayer. 
With as little as 15 nm of titanium beneath the TiN, 
low via resistance was obtained. Without titanium, 
no conditions resulted in acceptably low resis- 
tance. These results are shown in Figure 6, which is 
a cumulative probability plot of via resistance for 
three adhesion layer processes: (1) 120 nm of TiN 
deposited using conditions acceptable at the MIC 
level, (2) 120 nm of TiN deposited using a modified 
deposition process, and (3) a film with 40 nm of tita- 
nium beneath 80 nm of TiN. 

The importance of overall process integration 
became apparent when it was determined that the 
rotating cathode was damaging transistor charac- 
teristics during sputtering of the local interconnect 
TiN. The TiN deposition process was modified to 
decrease the substrate bias, and the new process 
recipe was characterized and retrofitted into the 
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Figure 6 Titanium Nitride Adhesive Layer 
Optimization 

contact and via levels. 9 This modification resulted 
in a more manufacturable single recipe to maintain 
for all CMOS-4 TiN levels. 

Deposition of the Al interconnect film for CMOS-4 
required little alteration from the CMOS-3 recipes. 
Film thicknesses were reduced by 500 angstroms 
(A) for metal 1 and metal 2 to maintain aspect 
ratios of no greater than 1. Metal 3 film thickness 
remained the same. The incorporation of the tung- 
sten plug process into CMOS-4 technology simpli- 
fied the metal step-coverage requirements because 
filling the contact with tungsten reduced the effect 
aspect ratio significantly. The metal 3 contacts used 
a tapered non-tungsten process and therefore 
required that the high step-coverage process devel- 
oped for CMOS-3 technology be maintained and fur- 
ther improved for the CMOS-4 process. 

The PVD Al step-coverage process is a three-step 
process, optimized for wafer temperature, aspect 
ratio, and the underlying material. The first step is 
a low-temperature deposition of a nucleation layer 
that provides a continuous coating over all sur- 
faces. Step 2 uses a low-power temperature ramp to 
allow enhanced surface mobility. Controlling the 
time during this step increases the average distance 
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that material can move along the surface and into 
the contacts. A high- temperature, high-power third 
step is used to reach the final film thickness. 10 

Development efforts provided a manufacturable, 
high-throughput, and high-yielding metallization 
process. A modified TiN cathode made the existing 
equipment set serviceable for another technology 
generation. The use of the historical equipment data- 
base minimized development time for the new tech- 
nology, and increased the level of understanding for 
a deposition technology with known benefits. The 
incremental nature of the changes from the CMOS-3 
to the CMOS-4 process allowed an efficient technol- 
ogy transfer due to a reduced number of learning 
cycles and the use of a familiar equipment set. 

Blanket Tungsten Plugs Process 
Development 

As stated previously, new process technologies 
were developed to meet the CMOS-4 process crite- 
ria. Blanket tungsten plugs are an example of a new 
technology. As Figure 1 illustrates, blanket tungsten 
plugs are used in the CMOS-4 technology to verti- 
cally interconnect metal 1 to the silicon substrate 
or to polysilicon (i.e., contacts), as well as to verti- 
cally interconnect metal 1 and metal (i.e., vias). 
Blanket tungsten plugs were selected for use in the 
CMOS-4 technology because they minimize spatial 
requirements for contacts and vias by allowing 
the use of vertical-wall openings rather than the 
tapered openings used in earlier technologies. 

At the outset of the CMOS-4 process development 
cycle, two tungsten plug technologies, selective 
and blanket, were evaluated. Although these plug 
technologies are very similar in their final structural 
form, their formation involves important differ- 
ences. As shown in Figure 7a, selective tungsten 
plugs are formed by the deposition of tungsten only 
on conductive surfaces and not on the surrounding 
oxide surfaces. Such growth leads to a filling of 
openings from the bottom up. In comparison, blan- 
ket tungsten plugs, as shown in Figure 7b, are 
formed by a three-step process: 

1. Sputter deposition of an adhesion layer 

2. CVD of a conformal blanket tungsten film 

3. Reactive ion etch back of the tungsten and adhe- 
sion layer to leave tungsten plugs surrounded 
by the adhesion layer 

Blanket tungsten plug processing thus nils open- 
ings from the sides as well as from the bottom up. 
As a result, variable contact and via depths do not 



pose a problem for blanket plug technology (see 
Figure 7b). 

Although selective tungsten is the simpler of the 
two tungsten plug formation schemes, it proved 
not to be manufacturable in the CMOS-4 develop- 
ment time frame. Our evaluations demonstrated 
a general inability to reproducibly generate low- 
resistance interconnections while simultaneously 
controlling selectivity loss. As a consequence, blan- 
ket tungsten plugs were chosen for use in the 
CMOS-4 process. 

Blanket Tungsten Plug Requirements 
Tungsten plugs and their processing have simple 
structural and electrical requirements. Structurally, 
the plugs must be free of voids and flush with 
the surrounding oxide surface after all undesired 
tungsten and adhesion layer residues have been 
etched from the top surface of the entire wafer. 
The void-free constraint ensures that potentially 
damaging process materials are not trapped in 
voids. The requirement that the plug be flush with 
its surrounding surface ensures good step coverage 
during the subsequent deposition of Al. The photo- 
micrographs in Figure 8 illustrate these structural 
attributes. Electrically, the plug resistivity must be 
high enough to induce current spreading in the 
plug, but the interfacial resistance between the 
plug and other conducting materials must be low 
enough not to adversely affect circuit performance. 
In addition, the processing of the plugs must not 
induce device damage. The latter point refers to the 
results of early investigations of tungsten plugs, 
which indicated that device damage could occur as 
a result of the attack of underlying materials during 
the CVD process of tungsten. 1112 

Finally, in addition to the structural and electrical 
objectives, the plug formation process had to be 
optimized from a cost perspective. Cost was of par- 
ticular concern because the industry-wide blanket 
tungsten deposition method of choice at the time 
(low-pressure CVD) involved very low deposition 
rates and made very inefficient use of the expensive 
source gas, tungsten hexafluoride (WF 6 ). 

Blanket Tungsten Plug Formation — 
Equipment and Process 

An Applied Materials Precision 5000 tungsten sys- 
tem was used for tungsten plug processing. The 
Precision 5000 system performs both tungsten 
deposition and etch back without breaking vac- 
uum. Processing involves first loading a wafer into 
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the deposition chamber. Then a nucleation layer 
approximately 50 nm thick is deposited at approxi- 
mately 475 degrees Celsius by the silane reduction 
ofWP 6 : 

SiH 4 +WF 6 ^ W + SiF 4 (1) 

The bulk of the tungsten layer, approximately 
800 nm, is then deposited using the hydrogen 
reduction of WP 6 : 

H 2 + WF 6 — > W + HF (2) 

Hydrogen reduction chemistry is used for the 
bulk of tungsten deposition because it yields good 
step coverage, whereas silane reduction does not. 13 
However, silane reduction chemistry is used to initi- 
ate tungsten growth because hydrogen reduction 
chemistry involves an incubation period before film 
deposition begins on TiN 14 , a step not required by 
silane reduction. 15 



Following tungsten deposition, the wafer is 
raised to expose its backside, and a short nitrogen 
trifluoride (NF 3 ) plasma etch is then performed in 
this same chamber to remove the small amount of 
tungsten that deposits on the backside edges of 
the wafer. The wafer is then transferred to a cham- 
ber in which a two-part etch back is performed. 
The bulk tungsten film is first etched in a sulfur 
hexafluoride/argon (SF 6 /Ar) plasma according to 
equation (3): 

W + SF 6 ^ WF 6 + SF X (3) 

until an optical emission from nitrogen, which has 
been liberated from the underlying adhesion layer, 
is detected. At this point, a chlorine/argon (Cl 2 /Ar) 
plasma is used to remove any remaining adhesion 
layer according to equation (4): 

TiN/Ti + Cl 2 -» TiCl 4 + N 2 (4) 
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Figure 8 Photomicrographs Showing Several Possible Plug Structural A t tributes 



Both etching steps may employ a rotating mag- 
netic field, which serves to improve the uniformity 
of the etching. 

Blanket Tungsten Deposition 

The more important properties associated with the 

tungsten deposition process include: 

■ Thickness uniformity 

■ Deposition rate 

■ film resistivity 

■ Step coverage 

■ Film stress 

■ Surface smoothness 

■ Tungsten hexafluoride conversion 

During development of the blanket tungsten 
deposition process, these properties were studied 
and 'globally" optimized with respect to the plug 
objectives. A screening study was performed, fol- 
lowed by a response surface modeling (RSM) study. 
Table 2 shows the results of the screening study of 
these properties. Seven process factors were varied 



over the ranges given in Table 2. Four factors were 
identified to have an impact on the responses of 
interest (i.e., the properties listed above). 

A detailed determination of the effect of these 
four factors was next performed in the context 



Table 2 Process Factors and Ranges Used 
in the Blanket Tungsten Deposition 
Screening Study 



Factor 



Range 



Spacing 1 

Susceptor temperature 1 
Total pressure 
Partial pressure, H 2 t 
Partial pressure, WFj 
Partial pressure, carrier gas 
Backside purge, N 2 



200-600 mils 
400-475°C 
60-80 torr 
16-32 torr 
1-1 .9 torr 
1.9-7.2 torr 
300-700 seem* 



Notes: 

Table 2 is adapted from data published in reference 16. 



'Indicates factors found to have the greatest impact on 
responses of interest. 

*sccm- standard cubic centimeters per minute 
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of an RSM study. The process factors and ranges 
used for this study are given in Table 3. RSM studies 
produce mathematical models that relate factors 
and responses of interest. In this study, seven differ- 
ent models, one for each response, were generated. 
The models were then used to search for points of 
interest either computationally or graphically. An 
example of a set of contour plots used for a graphi- 
cal search for tungsten step coverage and sheet- 
resistance uniformity (used to monitor thickness 
uniformity) is shown in Figure 9 

Table 3 Process Factors and Ranges Used 
in the Blanket Tungsten Deposition 
RSM Study 



Factor 



Range 



Preferred 
Settings 1 



Spacing 200-600 mils -400 mils 

Susceptor temperature 430-490°C 475°C 

Partial pressure, H 2 6-30 torr -18torr 

Partial pressure, WF 6 1-2 torr -1 .75 torr 

Notes: 

*Table 3 is adapted from data published in reference 17. 
^Iso shown are the preferred deposition conditions that 
satisfy the optimization criteria shown in Table 4. 

The specific criteria for the tungsten deposition 
optimization search are given in Table 4. The seven 
mathematical models/contour plots described 
above were used to find a region of the factor space 
where all of the optimization criteria could be 
met simultaneously. The preferred deposition con- 
ditions associated with this region are shown in 
Table 3- The models also indicated a relatively large 
process range within which the optimization objec- 
tives could be met. This range is important data for 
a production process. 

Contour plots for deposition rate, resistivity, 
stress, WF 6 conversion, and reflectance (used to 
monitor surface smoothness) are not included 
in this paper. However, evaluation of these plots 
showed that the relevant optimization criteria 
could be met throughout the factor space studied. 
As a result, these responses did not impose con- 
straints on the selection of the optimized process. 

The criteria set for deposition rate and WF 6 con- 
version corresponded to values significantly higher 
(by a factor of approximately 3 to 10) than those 
typical of blanket tungsten processes at the time of 
our study. 15 16 17 The improvement resulted mainly 
from the use Of higher deposition pressures com- 



pared to those previously used (80 torr versus 
less than 1 torr). Higher pressure also improved 
smoothness of the film, as seen in Figure 10. A 
smoother film is important because roughness on 
the tungsten film can be transferred into the under- 
lying oxide during tungsten etch back. 

The optimization criterion for film stress was 
set at a level corresponding to a mechanically stable 
film (i.e., one that would not peel spontaneously). 
Thus, although the stress values obtained are rela- 
tively high, they are below the critical level associ- 
ated with delamination. For the tungsten resistivity 
the observed values ranged from approximately 
7.7 to 10.5 /itohm per centimeter (cm) and were all 
acceptable. 

Unlike the other optimization criteria, those for 
step coverage and sheet-resistance uniformity were 
not met throughout the factor space studied (see 
Figure 9). Tungsten step coverage can directly 
impact void formation, and tungsten thickness uni- 
formity can impact plug recess control. Figure 11 
illustrates how thickness variation across a wafer 
can lead to variations in plug recess following etch 
back. Because of the importance attached to meet- 
ing these two optimization criteria, the allowed 
process window was diminished in size. Figure 9 
shows that a step coverage greater than or equal to 
95 percent restricted the WF 6 partial pressure to 
approximately greater than or equal to 1.5 torr, 
hydrogen (H 2 ) partial pressure to approximately 
less than or equal to 18 torr, and gas-inlet-to-wafer 
spacing to less than or equal to 400 mils. A sheet- 
resistance uniformity less than or equal to 3 per- 
cent served to further restrict the spacing to values 
between approximately 400 and 300 mils. 

In addition to the tungsten deposition process 
properties mentioned, the tungsten thickness had 
to be optimized. The upper limit on thickness was 
influenced by cost considerations and by the fact 
that a thinner tungsten deposit has less likelihood 
of being trapped in dielectric troughs. A thinner 
tungsten deposit also requires less overetch to 
remove tungsten spacers that may form on the 
trough sidewalls. As shown in Figure 12, the size of 
the spacer formed on a nonplanar dielectric for 
zero overetch with a nonisotropic etch depends on 
the absolute magnitude of the deposit thickness, T (jy 
and the worst-case dielectric sidewall slope, *, 
which together determine the local tungsten thick- 
ness range, T f - T cj . 

The lower limit on thickness was influenced 
by the need to fill contact openings with tungsten 
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TUNGSTEN SHEET-RESISTANCE UNIFORMITY, 1a EXPRESSED AS A PERCENT OF THE 

AVERAGE OF 49 SITE MEASUREMENTS 



Figure 9 Contour Plots of Tungsten Step Coverage and Sheet-resistance Uniformity 

(Reprinted from ^Journal of Vacuum Science Technology; see reference 17.) 
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Table 4 Tungsten Deposition-related 

Optimization Criteria for a Blanket 
Tungsten Plug Application 



Parameter 



Optimization Criteria 



Growth rate 

Resistivity 

Sheet-resistance 
uniformity 

Tensile stress 

Step coverage 1 

WF 6 conversion 

Reflectance 



2*300 nm per min 
2=~8 /u. ohm -cm 

^3% (a) 

^16X10 9 dyne per cm 2 

^95% 
2*12% 

^ 25% vs Si @436 nm 



Notes: 

'Table 4 is reprinted from the Journal of Vacuum Science 
Technology; see reference 1 7. 

^Step coverage for a trench 1 .5 /urn deep and 1 .0 t±m wide. 



prior to etch back, and further, to planarize the 
tungsten over the opening to minimize plug recess 
following etch back, as shown in Figure 13- Tung- 
sten thickness studies showed that a tungsten film 
greater than approximately 650 nm is required for 
the submicron-level contacts and vias of CMOS-4 
technology. 

Tungsten Etch Back 

In this section, the tungsten and adhesion layer 
etch-back processes are discussed. 

Bulk. Tungsten Etch Chemistry (SF 6 /Ar) The more 
important etch-back process properties for bulk 
tungsten etch include: 



■ Tungsten etch rate 

■ Tungsten etch-rate uniformity 

■ Tungsten/titanium nitride etch-rate ratio 

■ Isotropy (lateral etch rate/vertical etch rate) 

■ Microloading (tungsten plug/tungsten bulk etch- 
rate ratio) 

For bulk tungsten etch, the tungsten etch rate 
impacts throughput and cost. The tungsten etch- 
rate uniformity, tungsten/titanium nitride etch-rate 
ratio, etch isotropy (lateral etch rate versus vertical 
etch rate), and microloading (tungsten plug etch 
rate/bulk tungsten etch rate) all impact the ability 
to produce a residue-free surface, while simultane- 
ously maintaining flush plugs. A nonuniform etch 
rate affects plug recess because it leads to the clear- 
ing of one region of the wafer before the rest. This 
first-to-clear region becomes the site of the worst- 
case plug recess on a wafer. Figure 14 shows the 
plug recess that occurs in high etch-rate regions on 
a wafer when a tungsten film of uniform thickness 
has been etched to the proper end point for the 
lowest etch-rate regions. 

The tungsten etch isotropy affects plug recess by 
increasing the etch time required to clear the final 
residues on a nonplanarized dielectric. Figure 15 
illustrates that an etch isotropy that equals 1 (i.e., 
lateral etch rate equals vertical etch rate) can 
lead to uniform clearing of tungsten on nonplanar 
surfaces, and that an etch isotropy less than 1 
tends to produce spacers that must be removed by 
overetching. The additional time required to clear 
tungsten spacer residues leads to increased plug 



S3*' 

m 




(a) Deposited at 80 Torr 



(b) Deposited at Less Than 1 Torr 



Figure 10 Photomicrographs Showing Surface Smoothness of Tungsten Films 
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TUNGSTEN 



— \y 

TITANIUM/ 
.TITANIUM NITRIDE 



SILICON 
DIOXIDE 



TUNGSTEN 



(a) After Tungsten Deposition 




RECESSED PLUG 



FLUSH PLUG 



Figure II 



(b) After Etch Back 

Plug Recess as a Function 
of Thickness Variation 



recess. Finally, a high tungsten/titanium nitride 
etch-rate ratio affects plug recess control indirectly 
by preventing the liberation of oxygen from 
the underlying dielectric. Oxygen liberation has 
been shown to greatly increase the tungsten micro- 
loading factor. 18 

Development of the bulk tungsten etch in SF f> /Ar 
began with an KSM study of the tungsten etch rate 
and etch-rale uniformity. 19 Table 5 shows the fac- 
tors and ranges used in the initial study, along with 
the preferred etch conditions identified. Attempts 
to improve the tungsten/titanium nitride etch-rate 



ratio in SF () /Ar using only the factors in Table 5, 
while simultaneously maintaining high tungsten 
etch rates (greater than 500 nm per minute), were 
largely unsuccessful. 20 The desired improvement 
was eventually obtained through the incorporation 
of active wafer temperature control. Controlling 
the wafer temperature between 20 and 40 degrees 
Celsius improved the tungsten/titanium nitride 
etch-rate ratios from approximately 2:1 to between 
10: 1 and 50: 1, respectively. 20 

Adhesion Layer Etch Chemistry (CI ? /Ar) The 
more important etch-back process properties for 
adhesion layer etch include: 

■ TiN etch rate 

■ TiN etch-rate uniformity 

■ TiN/Tungsten ( W) etch-rate ratio 

■ TiN/oxide etch-rate ratio 

For adhesion layer etch chemistry in Cl 9 /Ar, the 
titanium nitride etch rate impacts throughput 
and cost, while etch-rate uniformity affects the 
level of adhesion layer undercutting or trenching 
that occurs around a plug (see Figure 8c). The other 
two etch properties impact plug recess and oxide 
loss. Consequently, a process is derived that etches 
TiN at a high and uniform rate while it simul- 
taneously and slowly etches tungsten and silicon 
oxide. 

Process development for the adhesion layer etch 
in CiyAr began with an RSM study of the TiN etch 
rate and etch-rate uniformity. 19 Table 5 shows the 
factors and ranges used in the initial study, along 
with the preferred etch conditions identified. 




COS a = T d 




OXIDE 



SILICON 



Figure 12 Photomicrograph and Schematic Drawing of a Tungsten Spacer 
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PLANAR TUNGSTEN SURFACE 



NONPLANAR TUNGSTEN SURFACE 



OXIDE 



TUNGSTEN 



.TITANIUM NITRIDE/ 
TITANIUM 



OXIDE 



TUNGSTEN 



-TITANIUM NITRIDE/ 
TITANIUM 



(a) After Tungsten Deposition 



PLANAR TUNGSTEN PLUG SURFACE 



NONPLANAR TUNGSTEN PLUG SURFACE 



OXIDE 



TUNGSTEN 



OXIDE 



TUNGSTEN 



(b) After Tungsten Etch Back 

Figure 13 Nonplanarity in a Blanket Tungsten Deposit Transferred into Plug Recess 

(Reprinted from Journal of Vacuum Science Technology; see reference 17.) 



HIGH ETCH-RATE REGION LOW ETCH-RATE REGION 



TUNGSTEN 
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DIOXIDE 

TITANIUM/ 
TITANIUM 
NITRIDE 



TUNGSTEN 



(a) After Tungsten Deposition 



Figure 14 



(b) After Etch Back 

Plug Recess in High Etch- rate Regions 
on a Wafer 



Acceptable TiN etch rates (approximately 145 nm 
per minute) and etch-rate uniformity (less than or 
equal to ± 5 percent, 1 a) were achieved. Since the 
adhesion layer is only approximately 120 nm thick, 
lower etch rates and higher etch-rate nonunifor- 



mities than those for the tungsten etch step can be 
tolerated. Wafer temperature control provides addi- 
tional latitude against plug sidewall trenching. 

Integration of Tungsten Plugs into 
CMOS-4 Technology) 

After the tungsten deposition and etch-back pro- 
cesses were developed, the overall plug formation 
process was integrated into the CMOS-4 technology. 
To ensure that electrical requirements were met 
and that adequate process latitude existed, the fol- 
lowing factors were considered in the integration 
studies: 

■ Dielectric planarization 

■ CoSi 2 thickness 

■ Contact depth 

■ Contact overetch 

■ Adhesion layer 

- Sputter etch preclean 

- Deposition temperature 

- Substrate bias 

- Th ickness 

- Material (TiN orTiN/Ti) 

After the integration studies were completed, 
a relatively robust process was developed. At that 
time, it was determined that acceptable electrical 
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(b) After Etch Back 



Figure 15 Comparison of Results of Tungsten Etch Isotropics 



results can be obtained over relatively large process 
ranges. 

Dielectric Process Development 

The CMOS-4 process presented two specific techno- 
logical challenges requiring dielectric development 
efforts. First, horizontal scaling without reducing 
metal thicknesses resulted in high aspect ratio 
spaces. The existing dielectric technology could 
not fill these spaces void-free. Second, the introduc- 
tion of blanket tungsten plugs required the develop- 
ment of a dielectric planarization process. 

In addition to meeting gap fill and planarization 
requirements, a dielectric film must also meet a 
number of electrical, mechanical, and deposition 
requirements. The required electrical characteris- 
tics for a dielectric film include low current leakage, 
high breakdown voltage, high electrical resistance, 
low dielectric constant, low mobile ion, and heavy 



metal concentration. Mechanical requirements 
include low moisture adsorption, low stress for 
cracking resistance, low particulate levels, and a 
low pinhole density. Important deposition require- 
ments include high deposition rate, good unifor- 
mity, and low deposition temperature (not greater 
than 450 degrees Celsius) for prevention of metal 
hillock formation. 

Gap Filling 

The conventional silane-based oxides could not 
meet the CMOS-4 technology gap fill requirements 
due to poor conformality characteristics. A silane 
oxide profile is typically described as a "breadloaf," 
that is, the film is thicker on the top of a structure, 
but thinner on the sides and bottom, and forms 
cusps along the sidewalk 21 Silane oxides nucleate in 
the gas phase, which causes this reentrant type 
profile and limits the aspect ratio (height/width) 
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Table 5 Process Factors and Ranges for 
the Initial Tungsten Etch-back 
RSM Study and the Preferred 
Etch Conditions Found 



Factor 



Range 



Preferred 
Settings 



Tungsten Etch 

Total pressure 

SF 6 flow rate 

Ar flow rate 

RF power 

Magnetic field 

Electrode 
temperature 

TiN Etch 

Total pressure 

Cl 2 flow rate 

Ar flow rate 

RF power 

Magnetic field 

Electrode 
temperature 



20-250 millitorr 
20-60 seem 1 
1 5-60 seem 
200-500 watt 
0-1 00 gauss 

60°C 

50-250 millitorr 
5-35 seem 
50-1 20 seem 
1 50 watt 
0-1 00 gauss 

60°C 



85 millitorr 
60 seem 
60 seem 
475 watt 
20 gauss 

60°C 

85 millitorr 
1 0 seem 
115 seem 
1 50 watt 
75 gauss 

60°C 



Notes: 

*Table 5 is reprinted with permission from the IHS 
Publishing Group; see reference 1 9. 
^sccm = standard cubic centimeters per minute 



that can be filled without forming a void to approxi- 
mately O.5. 22 

To fill the aspect ratio of approximately 1.2 
height/0.75-/Ltm width) for the CMOS-4 
process, a tetraethylorthosil icate (TEOS)-based oxide 
was used. The conformality ofTEOS oxides is much 
better than that of silane oxides. Because the organo- 
silicon compounds produced during a TEOS-based 
CVD process have a significantly higher surface 
mobility, the reactive molecules diffuse on the sur- 
face before reacting, which results in better step 
coverage without cusps. 23 TEOS-based oxides can fill 
aspect ratios up to 1.0 void-free. When used in con- 
junction with profile-altering deposition/etch-back 
techniques, as in CMOS-4 technology, aspect ratios 
up to 1.8 can be filled. 22 24 ^ 

TEOS oxides are well suited for use as interlevel 
dielectrics. Oxides formed through the plasma dis- 
sociation of oxygen (0 2 ) in the presence of TEOS 
(PE-TEOS) are denser than silane oxides and there- 
fore more resilient to moisture adsorption. PE-TEOS 
films arc typically under low compressive stress, 
which results in higher cracking resistance than 
the tensile-stressed silane films. Due to lower depo- 



sition temperatures, TEOS oxides exhibit thermal 
stability and less hillock formation. Mobile ion and 
heavy metal concentrations are lower with TEOS 
oxides, and device testing indicates lower defect 
densities. 21 26 

In the CMOS-4 process, filling gaps between 
minimum-spaced metal lines was achieved with 
profile-altering techniques in conjunction with a 
PE-TEOS bulk dielectric. Wafers were moved back 
and forth between a series of deposition and etch- 
back steps in a load-locked, multichamber cluster 
tool. Deposition of a planarized dielectric was com- 
pleted in one cassette-to-cassette operation. 

Gap Fill Process Flow 

The gap fill process is shown in Figure 16. It begins 
with a PE-TEOS conformal deposition that is halted 
prior to reaching a thickness that would fill the 
smallest gaps. Next, an argon sputter etch is per- 
formed on the oxide film deposited in the pre- 
vious step. The oxide removal rate is direction 
dependent, with the maximum removal occurring 
45 degrees from vertical. The original 90-degree 
corners are beveled into positively tapered angles. 
Then an ozone-TEOS ftJm is deposited to completely 
filj the remaining gaps. Ozone-TEOS is formed by a 
thermal reaction in which oxygen atoms are pro- 
duced by the rapid decomposition of ozone. Ozone- 
TEOS film is not desirable as a bulk dielectric due to 
its characteristically low density and tensile stress. 
However, the superior step-coverage characteris- 
tics of ozone-TEOS allow it to fill very small gaps. 

Following ozone-TEOS deposition, a second 
profile-altering etch back is performed. The ozone- 
TEOS is a sacrificial film that is removed in a CHF^ 
chemistry until it remains only in the small gaps 
and as a spacer along the sidewalls of larger fea- 
tures. The combined effect of sputter etching and 
ozone-TEOS processing is a void-free surface with 
positively sloped sidewalls. Finally, a bulk PE-TEOS 
dielectric film is deposited over this smoothed sur- 
face to reach the desired dielectric film thickness. 

Planarization 

Planarization is a dielectric smoothing process 
that is performed to smooth or reduce the steps 
created by underlying interconnect features. A pla- 
narization process minimizes reflective notching, 
reduces the extent of metal overetch, increases the 
thickness of metal over underlying topography, and 
reduces interconnect defect densities. Enhanced 
planarization is required to successfully form tung- 
sten plugs with a blanket etch-back process. The 



Digital Technical Journal Vol. 4 No. 2 Spring 1992 



67 



Semiconductor Technologies 




(a) Initial PETE OS Deposition 






(c) Anisotropic Etch and PE-TEOS Cap 



Figure 16 Schem a t ic Drawing of and Pb otomicrogra ph of Gap- filling PE-TEOS Dielectric Deposition 
Process Stages 



slope of the surface must be less than 30 degrees 
from the horizontal to ensure removal of tungsten 
stringers. 

Two planari/ation techniques were considered 
for use in the CMOS-4 process. The first technique 



involved depositing a sacrificial planarizing boron 
oxide film and etching the planarized pattern into 
the oxide. The second technique was similar, 
except a spin-on-glass (SOG) process was used as 
the planarizing agent. 
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Initially, the boron oxide planarization was cho- 
sen for use in the CMOS-4 process. This technique 
was selected based on a perceived manufacturabil- 
ity benefit over the SOG process. The boron oxide 
planarization can be done in a single cluster tool 
that uses the same serial process as the gap fill pro- 
cess. One cassette-to-cassette operation can both 
fill the minimum gap and planarize the surface. On 
the other hand, SOG planarization requires two sep- 
arate depositions, a spin coat, a cure, and an etch 
operation. 

Boron Oxide Planarization 
Boron oxide is a CVD film deposited by the plasma 
decomposition of trimethylborate (TMB) in the 
presence of 0 2 . The film has a low melting point 
and flows at deposition temperatures as low as 
400 degrees Celsius. Boron oxide can be etched 
back in a CHF 3 plasma chemistry and a 1:1 boron 
oxide:oxide selectivity. Spaces up to 25 fxm can be 
fully planarized with boron oxide. 27 

Boron Oxide Process Flow 
Upon completion of the gap fill process flow, 
the final bulk dielectric deposition step is targeted 
at approximately three times the desired final 
thickness. This overdeposition provides some ini- 
tial smoothing of the underlying interconnect fea- 
tures and also fills spaces in the 2- to 5-fxm range to 
eliminate formation of unwanted gaps. Next, two 
sequential boron oxide deposition and etch-back 
steps are performed. The boron oxide film flows 
as deposited, thereby smoothing and planarizing 




(a) Minimum-spaced Metal Lines 



Figure 17 Photomicrographs 



the topography. The isotropic etch back transfers 
the planarized surface into the underlying oxide. 
The photomicrographs in Figure 17 illustrate boron 
oxide planarization. The boron oxide film is a sacri- 
ficial film that is completely removed during this 
step. The deposition/etch sequence is repeated to 
further improve planarity. After all boron oxide 
is removed, the etch chemistry is switched to a 
higher-rate carbon tetrafluoride (CI ,) anisotropic 
etch process that removes the thick bulk deposi- 
tion to the final desired thickness. 

The boron oxide process is used in volume pro- 
duction for the dielectric between polysilicon 
and metal 1. For the metal 1 to metal 2 dielectric, 
the boron oxide process could not meet the wafer 
volume and uniformity requirements due to prob- 
lems with equipment reliability, thickness variabil- 
ity, and low throughput. The SOG planariza- 
tion scheme was selected as a more cost-effective 
process. 

Spin-on-glass Planarization 
The SOG process consists of a series of simple single- 
step operations. The overall integrated process 
characteristics exhibit good throughput and pro- 
cess control. The photomicrographs in Figure 18 
illustrate SOG etch-back planarization. 

SOG is a low-viscosity liquid polymer that is 
applied to the wafer using a spin-coating process 
similar to that used for photoresists. The SOG mate- 
rial used for CMOS-4 technology is a siloxane (methyl 
group containing polymer). Siloxane SOGs exhibit 
improved cracking resistance and lower dielectric 




(b) Wide-spaced and Isolated Metal Lines 



Boron Oxide Planarization 
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(ci) Minimum-spaced Metal f ines 



(b) Wide-spaced and Isolated Metal Lines 



Figure 18 Pholomicrogi a pbs Showing SOG Etch-back Plana rizat ion 



constants as compared to other available SOG mate- 
rials. Because it is a liquid, SOG can rill very small 
gaps and planarize topographical surfaces. The 
material is solidified into a glass by curing in a low- 
temperature furnace cycle. The cured SOG film has 
Sic ),- type mechanical and electrical properties, and 
can therefore be left behind as part of the bulk 
dielectric. Typically, a partial etch back of the mate- 
rial is performed that leaves SOG only in the gap 
areas.- 2 *- 29 

SOG Process Flow 

The initial gap fill deposition is deposited thick 
enough to provide a buffer for the SOG etch- 
back overetch. SOG is then spun on to fill the 
larger spaces and planarize the surface. A low- 
temperature furnace cure is performed to remove 
the solvents from the SOG and transform the mate- 
rial from a liquid into a glass. 

A partial etch back of the SOG is performed using 
a (;ill'./CF i /0 2 chemistry. The selectivity of SOG to 
the underlying I J P,-TKOS is targeted at I : I and is eon- 
trolled by the ratio of the CHl^ and () 2 gas Mows. 
Since the C) 2 How also affects the etch uniformity, 
trade-offs between selectivity and uniformity were 
necessary. SOG is etched completely from the tops 
of interconnect lines, but remains in the gaps of the 
larger-spaced lines. The etch back is targeted to 
remove SOG from locations at which contacts will 
be formed. Exposing SOG along the sidewalls of 
contacts can lead to problems with via "poisoning" 
and poor contact profiles. Finally, a PF-TEOS layer is 
deposited to achieve the desired dielectric thick- 
ness and encapsulate the SOG. 



Summary 

Digital's CMOS-4 on-chip interconnect technology 
is a three-level aluminum alloy metallization pro- 
cess, with planarizecl TFOS-based silicon dioxide 
dielectrics, tungsten-filled contacts and vias, and 
a minimum feature size of 0.75 /im. The process 
development goals required the maximum use of 
the existing manufacturing capability and the 
introduction of new process features. For photo- 
lithography, plasma etch, and PVD metallization, 
the 1.0-jLtm manufacturing equipment set and pro- 
cesses were modified and reoptimized for the 
submicron regime. In addition, two new process 
features, a blanket CVD tungsten process and a 
TEOS-based oxide planarization process, were 
developed and implemented in manufacturing to 
meet the CMOS-4 technology requirements. 
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Implementation of Defect 
Reduction Strategies into 
VLSI Manufacturing 

CMOS-4 technology combines a high-performance microprocessor with a fast, dense 
RAM. Consistently obtaining a specified die yield on CMOS-4 devices required the 
implementation of a series of defect reduction procedures. To achieve high yields, 
microcontamination and defect reduction plans needed to be in place well before ini- 
tiation of product manufacturing. Levels of over all cleanliness had to be specified and 
controlled. Process equipment was monitored at the new particle level of 0375 fim 
and greater to collect data. Defect density test reticles were designed and wafers were 
processed. Electrical results were then incorporated into a yield model and used to 
prioritize yield enhancement activities. Experiments were designed to reduce the 
defect levels of process areas, such as p-gate leakage and metal 2 short circuits. 



Fourth-generation complementary metal-oxide 
semiconductor (CMOS-4) technology calls for a die 
area greater than 2 square centimeters (cm 2 ), geo- 
metries of 0.75 micron (^m), a gate oxide of 10.5 
nanometers (nm), unique metallurgy, 1.7 million 
transistors, and 23 masking levels. These very large- 
scale integration (VLSI) process features required 
for chip performance dictate the need for increased 
defect reduction and microcontamination control 
in the semiconductor production environment. 

Production of one fully functional CMOS-4 device 
is virtually impossible without substantial efforts in 
defect reduction and microcontamination control. 
Obviously, a single 0.75-jixm particle in the active 
area of a die can create a short circuit and cause 
the entire device to fail. A 10.5-nm particle at gate 
oxide can have the same effect. A high level of 
sodium in rinse water can lead to premature gate 
oxide breakdown. In fact, there are approximately 
250 processing steps that could contribute to the 
failure of a chip. 

This paper describes the principles of micro- 
contamination control and relates their application 
to VLSI manufacturing. It next discusses improve- 
ments that we implemented for wafer handling, 
cleaning, and monitoring. It then outlines the over- 
all defect reduction techniques to increase product 
yield, focusing on efforts in the areas of p-gate leak- 
age and metal 2 short circuits. The paper concludes 



with considerations for defect reduction in the next 
generation of CMOS technology. 

Application of Microcontamination 
Control Principles 

Implementation of a successful set of defect reduc- 
tion procedures depends on understanding the 
principles of microcontamination control. This 
field of study encompasses a wide variety of areas. 
Microcontamination control seeks to minimize the 
presence of any substance, particle, monolayer, or 
ionic contaminant in the wafer production environ- 
ment, that could cause a device to fail. 

The facility in which CMOS-4 devices are manu- 
factured was constructed for the production of the 
1.0 -fim CMOS-3 technology. However, because of the 
high capital costs associated with clean room con- 
struction, the facility was originally designed to 
meet the fabrication needs for three generations of 
CMOS technology: CMOS-3, CMOS-4, and CMOS-5. 
The original design specifications for the air quality 
of the clean room, the clean room suits, and the 
water, chemicals, and gases used in semiconductor 
manufacture are discussed in the following sections. 

Clean Room Air 

The ambient air quality is carefully controlled in a 
clean room. Typically, indoor ambient air contains 
more than one billion particles per cubic foot. 1 To 
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obtain a high yield on CMOS-4 devices, wafers must 
be processed in a well-characterized clean room. 

The level of cleanliness within a clean room is 
defined by the clean room class, which is deter- 
mined by the number of particles greater than 
0.5 Atm per cubic foot. The CMOS-4 devices are man- 
ufactured in a class 10 environment, i.e., in a clean 
room that has fewer than ten 0.5-auti particles per 
cubic foot. To obtain the class figure, air is mea- 
sured with laser-scattering techniques when the 
clean room facility is at rest. The air filtration proce- 
dure employs high-efficiency particulate air (HEPA) 
filters with filter efficiencies greater than 99-9999 
percent. Since HEPA technology is well advanced 
and understood, most of the particles in clean 
rooms are generated when process equipment and 
personnel are introduced. Airflow is maintained at 
vertical laminar to prevent particle deposition on 
the product wafers. 

Clean Room Suits 

The material used for the clothing worn by clean 
room workers was carefully selected. Humans can 
shed up to a million particles of a size greater than 
0.5 jtxm every minute. To protect the wafers from 
microcontamination and maintain a class 10 envi- 
ronment, clean room workers must don special 
suits. This attire must cover the worker from head 
to toe. 

When device line widths were greater than 
1.0 /Ltm, polyester fabrics were used to contain the 
particles emitted by clean room workers. The pore 
size of the best polyester fabric is 17 ^m, and the filter 
efficiency at 0.5 /urn is poor at less than 60 percent. 
With submicron line widths, new materials needed 
to be evaluated for their ability to contain particles. 
A new material, an expanded polytetrafluoroethy- 
lene, with a pore size of less than 1.0 i±m and a filter- 
ing efficiency of greater than 99-99 percent, was 
selected as the garment material of choice. 2 

Ultrapure Deionized Water 
Each wafer is exposed to hundreds of gallons of 
ultrapure deionized (DI) water during the 250 pro- 
cessing steps involved in producing CMOS-4 
devices. The purity of the DI water, which is mea- 
sured primarily by resistivity meters, is critical to 
obtaining a high yield on CMOS-4 devices. Not only 
must the number of particles be minimized, but the 
levels of cations, anions, total oxidizable carbon 
(TOC), silica, and bacteria must also be carefully 
regulated and monitored. For example, bacteria 



contain phosphorus and can be a source of uncon- 
trolled dopant. Excess levels of TOC can double the 
rate of initial thermal oxidation. Silica is known to 
decrease the reliability of thermally grown oxides, 
and the presence of ionic contaminants can change 
semiconductor carrier lifetimes. 3 Table 1 lists the 
specifications for the DI water system used for 
CMOS-4 production. 

Table 1 Deionized Water Specifications 
for CMOS-4 Technology 

Resistivity 18.0 megohm per cm @ 25°C 

Bacteria 0.05 colonies per milliliter 

maximum 

Particles (>0.5 ^m) 200 per liter maximum 
Total organic carbon 50 ppb maximum 
Silica 1 0 ppb maximum 

All cations and anions 1 .0 ppb maximum 

Note: 

ppb equals parts per billion 



Chemicals and Gases 

Wet chemistry is used to clean wafers and in the 
photolithographic process to develop and strip 
photoresist. The particle and impurity levels of the 
incoming chemicals used during wafer processing 
had to be specified and monitored. Studies have 
shown that bare silicon wafers placed in an ammo- 
nium hydroxide/hydrogen peroxide (NH 4 OH/H 2 0 2 ) 
solution exhibit a linear correlation between the 
metal content in the peroxide and the metal surface 
contamination. The same studies have also shown 
that iron and zinc are more important than other 
metals. 4 

Throughout the manufacturing process, gases 
are employed during gate oxide growth, metal 
depositions, and plasma etches. Both impurity and 
particle levels in these gases are critical. Impurity 
level changes can alter plasma etch-rate uniformity 
or could lead to corrosion. The process of reducing 
the impurity and particle levels of gases is better 
understood than that of wet chemicals. Implement- 
ing new filter technology and employing electro- 
polished materials in gas distribution systems have 
reduced impurity levels. 

Improvements to Wafer Cleanliness 

Prior to implementation of a structured defect 
reduction plan to increase yield on CMOS-4 devices, 
three areas known to require better control were 
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improved at a minimal cost and effort. These were 
wafer handling, wafer cassette/box cleaning proce- 
dures, and particle per wafer pass monitoring. 

Wafer Handling 

Clean room workers use tweezers to handle wafers 
when they read wafer numbers and load wafers 
into equipment. A full wafer cassette contains 25 
150-millimeter (mm) wafers, and each wafer is sep- 
arated by only 5 mm of clearance. Wafer inspec- 
tions of CMOS-3 devices revealed that scratches 
contributed 10 to 15 percent of the die loss. 

Three corrective actions were implemented to 
reduce the number of scratches. Since the use of 
tweezers to handle wafers was the primary cause of 
scratches, their use on product wafers was banned 
from the production line. Vacuum wands were 
installed throughout the fabrication area. Vacuum 
wands restrict contact to the wafer backside only 
With proper training in the use of wands, workers 
can achieve better vertical wafer control. 

In addition to vacuum wands, automatic wafer 
transfer systems were installed in the production 
area. These mass transfer systems allow a worker to 
move an entire 25-wafer lot from one type of cas- 
sette to another, for example, to transfer wafers 
from polypropylene to quartz cassettes prior to a 
photoresist strip. This procedure eliminates man- 
ual roll transfers of wafers, which are known to gen- 
erate particles. 

Automatic wafer handlers were installed on engi- 
neering microscopes as were automated wafer 
sorters. Use of these devices eliminated a major 
source of scratches on experimental lots, which are 
inspected and sorted often. Sneeze guards installed 
at the microscopes were an added insurance 
against damage to any die from spittle. 

Cassette/Box Cleaning Procedure 
Several cleaning procedures for the wafer cassettes 
and boxes were implemented. A surfactant, which 
helps wet the surface, was added to the cleaning 
solution of the boat/box washing equipment. This 
improved the cleaning efficiency over the pre- 
viously used method of DI water only. Wafer cas- 
settes were cleaned more often. Cleaning cycles 
were added at several front-end operations, includ- 
ing well oxidation, initial oxidation, and first- and 
second-gate oxidation. In addition, weekly cleaning 
of cassettes dedicated to specific equipment was 
initiated. 



Since wafer cassettes are known to become 
porous and in time to contaminate the wafer sur- 
face, a test was introduced to determine when 
boats start to degrade. Results indicated that if cas- 
settes took longer than 30 seconds to rinse to resis- 
tivity in DI water, they should be replaced. 

Particle per Wafer Pass Monitoring 
Each piece of process equipment is monitored on a 
routine basis to detect particles added at each wafer 
pass; these checks are performed once per shift. 
The selected wafers are measured on a high-angle 
laser-scattering system designed for unpatterned 
wafer particle detection. The wafers are then pro- 
cessed through the production equipment. After 
final processing, the wafers are measured again for 
particles. These measurements are subtracted from 
the initial readings, and the difference is recorded 
on a trend chart. 

For the 1.0-/xm technology, bare silicon wafers 
were measured for particles greater than 0.5 /urn, or 
half the minimum polysilicon line width. This size 
was chosen based on the premise that a conductive 
particle of this size could degrade device perfor- 
mance and reliability. Using the same reasoning, 
and prior to the implementation of CMOS-4 tech- 
nology into manufacturing, the particle monitoring 
size was decreased to 0.375 /xm. 

Particle per wafer pass (PWP) monitoring must 
closely simulate the wafer processing environ- 
ment to which product wafers will be exposed. If 
a process involves an oxide deposition, then the 
PWP process should also. However, it is important 
not to damage the wafer surface during the PWP 
run. For example, gases should be flowed, if pos- 
sible, during a PWP process on an etcher, but a 
bias should not be applied. A bias might cause sur- 
face damage that could result in false particle 
counts. 

A continuous production PWP program must be 
maintained on the process equipment. Small parti- 
cles are held to a surface by strong van der Waals 
forces. These forces increase over time due to the 
particles conforming to the surface, thus increasing 
the contact area. Therefore, once particles are 
deposited on a wafer's surface, they are very diffi- 
cult to remove. 5 Even if each process step con- 
tributed only five particles to each wafer, the 
cumulative effect of 250 process steps would be 
more than 1000 particles deposited on a fully pro- 
cessed wafer. 
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Defect Reduction Procedures 
to Increase Product Yield 

All microcontamination control efforts would be 
fruitless without vehicles to assist in yield predic- 
tion and defect prioritization. The procedures 
developed for CMOS-4 technology are outlined in 
this section. Yield modeling and test chips are 
described elsewhere in this issue. 6 Therefore, the 
following discussion is brief. 

General Yield Model 

For the purposes of this paper, the Poisson yield 
model is used. 1 This model is very simple, but it can 
be used to illustrate some key points. The yield 
model is given by the following equation: 

V=e -AD 

where Y equals yield, A equals chip area in square 
centimeters, and D equals defect density per square 
centimeter. 

It is easy to see that if the yield is equal to 50 per- 
cent, AD must be 0.69. Now jf the chip area is 
increased by 50 percent, AD becomes 1.04, and the 
yield is 35 percent, if all else remains equal. 
However, each new CMOS technology reduces the 
line widths and decreases the film thicknesses. The 
size of a "killer" defect therefore decreases, which 
automatically increases the baseline defect density. 
For each successive generation of CMOS devices, 
substantial improvements are needed in the reduc- 
tion of defect densities. 

Test Chips 

The test chips used during the manufacture of 
CMOS-4 devices were both full- and short-loop 
defect density test vehicles. Full-process test chips 
included snake structures to capture intralevel 
open circuits, comb patterns for intralevel short cir- 
cuits, and capacitors for interlevel short circuits. 
The full-process test chips were run routinely to 
determine defect reduction priorities, as well as to 
assist in reducing defect levels. 

Short-loop test chips, which are processed 
through 20 to 25 process steps, contain the same 
snakes, combs, and capacitors as the full-process 
test chips. Their purpose is to focus attention on 
certain layers, such as back-end levels, which are 
known to contribute many defects. These short- 
loop chips can be used in designed experiments to 
compare different processes. Short-loop chips also 
monitor shifts in defect density from week to week, 



since they can be processed with less than a two- 
week cycle time. 

Defect Reduction Priorities 
After electrical testing of failed structures, visual 
inspections were performed to identify various 
defect types. It is very common for large defect den- 
sities to be caused by more than one defect type. 
These inspections helped to design the experi- 
ments used to reduce the defect levels found in 
p-gate leakage and metal 2 short circuits. 

In addition, laser- and holography-based auto- 
mated inspection tools were initiated in-line at var- 
ious process steps, including active area after strip 
inspect (ASI), polysilicon ASI, local interconnect 
ASI, tungsten plug 1 and plug 2 ASIs, and metal 1 and 
metal 2 ASIs. These inspections were performed 
routinely on all full- and short-loop test chips. The 
tungsten plug and local interconnect steps were 
chosen because they were not part of previous 
CMOS generations. The remaining steps were cho- 
sen because they were known areas of concern 
based on previous electrical results. 

Defect reduction priorities are determined by 
incorporating electrical results from full-loop 
defect density test chips into a detailed yield model. 
Based on this information, yield enhancement activ- 
ities are prioritized. Two of the areas selected for 
defect reduction were p-gate leakage and metal 2 
short circuits. 

P-gate Leakage Enhancements 

Historical data obtained from CMOS-3 processing 
highlighted two potential contributors to high 
p-gate leakage values: surface damage and metallic 
contamination. The well etch process, which 
opens up the gate areas, was performed in a hexoid- 
configured, reactive ion etch (RIE) batch etcher. 
Designed experiments, therefore, examined ways 
to reduce potential lattice damage caused by the 
known physical etch process. The batch reactor 
processed 12 wafers at a time, and initial process 
development ensured complete oxide removal 
from all wafer surfaces. Based on uniformities both 
within the wafer and from wafer to wafer, a 30 per- 
cent overetch was chosen. The overetch proce- 
dure, however, exposed the silicon surface of the 
wafers to the plasma for an extended period of 
time, thus exacerbating any surface damage. Initial 
attempts to improve the gate leakage varied the 
overetch between 30 percent and 0 percent. 
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Results showed the p-gate leakage values consis- 
tently ran at a lower value with the decreased 
overetch, confirming a decrease in the amount of 
surface damage. Optimization work continued to 
lower the p-gate levels further. This work is out- 
lined in the following sections. 

Silicon Lattice Damage 

With the improved overetch process, stacking 
faults and pits continued to be seen in the well 
region. This confirmed the presence of surface 
damage, which required further experimentation. 
Split lots were processed to examine the impact of 
changing power, bias, oxygen flow rate, and pres- 
sure on p-gate leakage and silicon lattice damage. 
Repeated attempts produced identical results; 
none of the changes affected p-gate defect density 
or visual surface damage. At this point, a split lot 
was designed to study various starting materials. 
The original starting material was compared to a 
polysilicon-backed starting material. Polysilicon is 
a known "getterer" of surface damage, that is, it 
attracts the damage to the backside of the wafer; it 
was expected that this type of starting material 
would show an improvement. The visual inspec- 
tion of the polysilicon starting material found 
no stacking faults or pits in the well regions of 
the wafers. The p-gate defect density values also 
showed an improvement. Confirmation material 
verified the initial findings, and the new starting 
material was placed on the manufacturing line. 

Metallic Contamination 

In spite of significant improvements to the process, 
p-gate leakage values continued to show intermit- 
tent failures. The one area not previously investi- 
gated concerned the potential presence of metallic 
contamination at the wafer surface. Patterned and 
unpatterned wafers were sent for total reflectance 
X-ray fluorescence (TXRF) analysis. The surface 
analysis detected the presence of metallic contami- 
nants: specifically cobalt, iron, and nickel. Iron and 
nickel are common elements in stainless-steel com- 
ponents, and it was discovered that the gas distri- 
bution tubes being used in the hexoid etcher were 
constructed of stainless steel. Replacement alu- 
minum gas tubes were installed in the etcher, and 
additional surface analysis tests were taken. As 
expected, the iron and nickel elements were no 
longer detected; however, cobalt was present. 
Consequently, a complete wet clean of the etch 
system was performed, and one final set of etched 



wafers was analyzed by TXRF. The results con- 
firmed the elimination of the cobalt contamination. 

The results of TXRF analysis on all metallic con- 
taminants are given in Table 2. Since wet cleans 
were performed routinely on the system, a trend 
chart of p-gate leakage was available to show the 
dates of all completed wet cleans. The trend chart 
showed that coincident with every wet clean was 
an improvement in p-gate defect density The 
p-gate performance would begin to degrade when- 
ever a metal 1 contact process was run in the same 
etcher. The contact etch process opened contacts 
to a cobalt silicide layer, which confirmed the con- 
tact etch process as the source of the variable 
cobalt levels. Cobalt cross-contamination of the 
gates was occurring whenever a well etch was pro- 
cessed immediately following a metal 1 contact 
etch, thus inducing p-gate variability. For this rea- 
son, it was decided to dedicate separate etch tools 
for the well etch process and for the metal 1 con- 
tact etch process. 

Table 2 Results of TXRF Surface Analysis 
(Units of 10 12 atoms per cm 2 ) 



Initial wafer 

Wafer center 

45 degrees from center 

225 degrees from center 

Aluminum gas tubes 

Wafer center 

45 degrees from center 

225 degrees from center 

Post wet clean 

Wafer center 

45 degrees from center 

225 degrees from center 



Iron Cobalt Nickel 



2 


0.5 


2 


3 


0.4 


2 


2 


0 


2 


0 


0.7 


0 


0 


0.8 


0 


0 


0.6 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 



Single Wafer Etch System Evaluation 
At the same time the metallic contaminant sources 
were isolated, a well etch process on a single wafer 
etcher was developed. Single wafer etchers have 
improved etch-rate uniformity control over batch 
etchers. Also, the single wafer etch process is more 
chemical (as opposed to physical) than the etch 
process used in batch systems. Optimization of pro- 
cess parameters (e.g., gas flows, pressure, power, 
and gap) was performed on patterned monitor 
wafers. As the final process step, a low-power sur- 
face cleanup was added to remove any remaining 
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surface contaminants from the wafer surface as 
well as from the top layer of damaged silicon. 
Cross-sectional micrographs were taken to verify 
the integrity of the slopes of the patterned lines. 

Once the process was final ized on monitor 
wafers, full-process split lots were run through the 
line to compare the hexoid-configured etch pro- 
cess with the planar-configured etch process. The 
results of one of the splits are shown in Figure L 
A substantial improvement in p-gate defect density 
was obtained when using the single wafer etch pro- 
cess. Defect density levels on the batch reactor por- 
tion averaged 300 defects per cm', whereas the 
single wafer etcher portion averaged 15 defects per 
cm 2 . Confirmation product lots were processed to 
ensure the probe yield was not adversely affected 
by the etch enhancements, and the well etch pro- 
cess was released again on the single wafer etch sys- 
tems. The final requirement of this process release 
was the continued segregation of the well etch pro- 
cess and the metal 1 contact etch process. The 
metal 1 contact etch process remained on the batch 
etcher. 
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Figure 1 P-gate Area Defect Density Levels 
as a Function of Etcher Type 

Reduction in Metal 2 Short Circuits 

The implementation of the CMOS-4 metal process 
into manufacturing brought a new set of chal- 
lenges. The estimated yield impact of the metal 2 
short circuits on the first set of full-process lots 
was approximately 40 percent, based on a single- 



process-step defect density (P ()i ) level of approxi- 
mately 40 delects per 100-meter (m) length. 

Equipment ^induced Short-circuiting 
Mechanisms 

PWP data on two sets of process equipment empha- 
sized the need for concentrated particle reduction 
efforts. These two systems were metal deposition 
and dielectric deposition. Task forces, with mem- 
bers from defect reduction, process engineering, 
and equipment engineering, were organized and 
chartered with reducing these PWP numbers. 

Metal Deposition System Upgrades A known 
yield I i miter of the CMOS-3 process was the pres- 
ence of a high number of titanium nitride (TiN) par- 
ticles on the wafer surface. Information obtained 
from the equipment vendor and confirmed by 
Digital's semiconductor manufacturing fabrication 
plant in Scotland indicated that a new planar tita- 
nium target would provide a cleaner TiN film, thus 
decreasing metal short circuits and enhancing 
yield. The planar target, a rotating magnetic experi- 
mental (RMX) cathode, was able to decrease ihe 
yield loss attributed to TiN particles by 75 percent. 
Since several TiN layers were used in the CMOS-4 
process, fewer particles on TiN film would defi- 
nitely benefit the overall yield. 

The PWP data in Figures 2 and 3 show how the 
particle levels dropped once the RMX cathode was 
installed. Data obtained on one of the initial RMX 
split lots showed a 50 percent reduction in metal 
short circuits. A substantial number of confirma- 
tion lots were processed to examine metal short cir- 
cuits and electrical test data. Once the lots were 
analyzed, the decision was made to release the RMX 
cathode into production and an improvement in 
metal 2 short-circuit levels was realized. 
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Figure 2 TiN Particles before Installation 
of RMX Cathode 
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Figure 3 TiN Particles after Installation 
of RMX Cathode 

Dielectric Deposition System Particle Reduction 
The dielectric deposition systems had the highest 
particle levels of any equipment in the fabrication 
area, as shown in Figure 4. Since dielectric particles 
can induce metal short circuits, and metal short cir- 
cuits typically impact yield more than any other 
defect structure, a concerted effort was made to 
improve the particle stability The four major areas 
of change were (1) the installation of a new type 
of O-ring, (2) the initiation of a continuous pump 
ballast, (3) pressure adjustments in the load lock, 
and (4) modifications to the "clean" recipes. These 
changes improved the average PWP count from 
537.7 to 2.6 particles greater than 0.375 /xm, as 
shown in Figure 5. 




-20 - 



MEAN: 537.680 

Figure 4 Initial Dielectric Particle Baseline 

Process- induced Short-circuiting 
Mechanisms 

Once the equipment PWP data improved to the lev- 
els shown in Figures 3 and 5, and the metal 2 short- 
circuit levels fell to approximately 10 defects per 
100-m length, two systematic defects were uncov- 
ered: corrosion and grain-boundary stringers. Both 
defect types were noted during the inspection of 
failed sites on the test chip structure. The defects 
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Figure 5 Improved Dielectric Particle Baseline 

appeared intermittently (not all lots were affected 
to the same extent), and were clustered around the 
wafer edge. The metal 2 short circuits consistently 
were twice as high as the metal 1 short circuits; 
therefore, processes unique to the metal 2 process 
sequence were evaluated first. A lot history investi- 
gation performed on all lots exhibiting corrosion or 
grain-boundary stringers found no equipment com- 
monalities, confirming possible process-induced 
mechanisms. Several processes unique to the metal 
2processflow include the intermetal dielectric, the 
metal 2 cutout sequence (used to remove metal 
from alignment targets allowing metal 2 alignment 
without manual intervention), and the tungsten 
process sequence. 

Corrosion Figure 6 shows the corrosion on a 
metal line in the test circuit. A common cause of 
corrosion is the interaction of chlorine with mois- 
ture. The tungsten etch-back process was inves- 
tigated because it uses chlorine as an etch gas. A 
short-loop monitor wafer experiment was designed 
to study the effects of various post-tungsten etch- 
back processing procedures on corrosion. The 
level of corrosion was determined by means of a 
patterned wafer defect detection tool. Practices 
commonly used in the semiconductor industry to 
eliminate chlorine-induced corrosion are a fluorine 
passivation process and/or an immediate water 
rinse. The data in Figure 7 shows that, without 
an immediate water rinse, the corrosion counts 
increase with time. After only 90 minutes, the rate 
of increase changed dramatically: Initially, the 
incorporation of a rinse provided more stability 
but the corrosion levels were unacceptable after 
just 11 hours. The addition of a 5-second sulfur 
hexafluoride passivation process, however, com- 
pletely prevented any corrosion. 

To verify this data on test chips, several lots were 
split. The results from one of the splits are shown 
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Figure 6 Corrosion on a Metal Line 
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Figure 7 Particle Counts of Bare Aluminum 
Wafers as a Function of Time after 
Tungsten Etch Back 

in Figure 8. The metal 2 short-circuit levels 
improved from 11.2 defects per 100 m to 4.1 defects 
per 100 m. Additional splits performed on product 
and test lots coniirmed that the sulfur hexafluoride 
passivation process was as good as, if not better 
than, the original process. The electrical results 
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Figure 8 Resul ts from Spl it Lot Test of Metal 2 
Short Circuits 

verified that the defects were a wet type of corro- 
sion formed on metal 1 in the bond pads and guard 
rings and were caused by exposure to chlorine 
during the tungsten etch-back process at tungsten 
plug 2. 

Grain-boundary Stringers A short-loop lot was 
designated to look for potential contributors to the 
grain-boundary stringers. It examined three factors: 
the dielectric film, the cutout process sequence, 
and the metal 2 deposition process. Results of the 
lot are shown in Figure 9. They indicate that the 
most significant factor affecting grain-boundary 
stringers was the cutout process sequence. Since 
the grain-boundary stringers were seen on an inter- 
mittent basis, the split was purposely designed to 
exaggerate the impact of the cutout process. This 
was achieved by reducing the metal overetch and 
processing the designated wafers through the 
photolithography and strip cycles twice. The metal 
2 short circuits on these wafers were dramatically 
higher than those on the wafers that did not receive 
a cutout process; they improved from 27 defects 
per 100 m to 1.4 defects per 100 m. An example of a 
grain-boundary stringer is shown in Figure 10. 

Since processing without a cutout process was 
not an option, a split lot was designed to study vari- 
ous versions of the cutout strip process. AJ though 
the electrical results of the split were not conclu- 
sive (all defect levels were excellent), a scanning 
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Figure 10 Photomicrograph of a 

Grain- boundary Stringer 

electron microscope (SEM) analysis of the same die 
location on all splits showed a significant difference 
among splits. All splits incorporating a wet solvent 
had evidence of grain-boundary stringers, whereas 
the splits stripped only in a downstream plasma 
had none of the stringers. Therefore the cause of 
the grain-boundary stringers was the reaction of 
the grain boundaries with the solvent and subse- 
quent water treatments. 

The purpose of the solvent/water portion of the 
strip process was to assist in the removal of photo- 



resist that had been plasma-hardened by the metal 
etcher. By transferring the metal etch process 
from a plasma etcher to a wet sink, the option of a 
downstream plasma strip became available. Several 
additional splits were processed to study the com- 
bination of a wet chemical cutout etch followed by 
a downstream plasma strip. They confirmed that 
the elimination of the interaction between the 
grain boundaries and the solvent/water combina- 
tion resulted in a reduction in grain-boundary 
stringers. Metal 2 short circuits on one of the lots 
improved from 5.1 defects per 100 m to 3.2 defects 
per 100 m and showed a corresponding improve- 
ment in yield. 

From the initial stages of CMOS-4 manufacturing 
up to the present, an overall improvement in metal 
2 defect levels has been seen. The test chip metal 2 
short-circuit D mi levels have diminished from 
approximately 40 defects per 100 m to approxi- 
mately 4 defects per 100 m. Corresponding metal 2 
visual defect inspection data has decreased from 
approximately 1.5 defects per cm 2 to approxi- 
mately 0.2 defects per cm 2 . 

Future Considerations 

For the successful production of future generations 
of CMOS technology, improvements need to be 
made in the areas of general microcontamination, 
wafer handling, defect review, and data manage- 
ment. The needed improvements are outlined 
below. 

Microcontamination 

Initial results from particle studies on PWP measure- 
ments from current equipment indicate that 70 per- 
cent of the total particles are between 0.25 /xm and 
0.375 /^m. As discussed previously, once deposited 
on wafers, these small particles are very difficult to 
remove. Efforts must be made to isolate and elimi- 
nate the source of these particles before the equip- 
ment is introduced into manufacturing. To provide 
early identification, in-situ particle measurement 
for process tools needs to be incorporated as a part 
of the tool specification. 

In addition, as geometries shrink and gate oxides 
become thinner, the importance of understanding 
the effect of nonparticulate contamination such 
as zinc, sodium, and hydrocarbons on device yield 
is critical. The cost of reducing such contaminants 
from the wafer environment is astronomical. We 
must learn which contaminants are harmful and at 
what level. Surface analysis techniques, such as 
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TXRF and Auger electron spectroscopy, need to be 
incorporated into the manufacturing environment 
to provide the necessary data. 

Wafer Handling 

As die size continues to increase, wafer size also 
increases. As technology moves toward 200- and 
300 -mm wafers, manual handling of wafers is all 
but impossible. One broken wafer will result in 
the loss of thousands of dollars. Automated han- 
dling of wafers must be incorporated across each 
process step. 

Defect Review 

As the killer defect size continues to decrease, 
visual inspections with optical microscopes will 
lose their value. As many as 50 percent of the elec- 
trical failed combs on our 0.5 -^m process are not 
observed during optical microscopic review. This 
has highlighted the need for two things: test chips 
designed for failure analysis and easy-to-use in-line 
inspection SEMs. 

Test chips can no longer be designed without the 
active involvement and input of defect reduction 
personnel. A test chip must not only be used to 
prioritize delect reduction efforts, but it must also 
help to determine and isolate defect sources. A test 
chip that is difficult to inspect provides only half 
the information needed to reach the ultimate goal 
of high yield. 

In-line inspection SEMs are required to review 
defects found during process inspections and to 
analyze process problems. They should be capable 
of energy dispersive X-ray spectroscopy (EDXS) 
analysis to provide information on the elemental 
components of the defects. In addition, the transfer 
of electrical test data to an SEM is required so that 
failing locations can be easily reviewed. 

Data Management 

The future of defect reduction efforts depends on 
the ability to manage and analyze large quantities of 
data. Defect inspection tools perform frill wafer 
inspections in less than five minutes. A thorough 
understanding of statistical methods, such as con- 
trol charts and sampling procedures, is required 
to determine the extent of defect review efforts. 
Automatic storage of images on optical disks for 
subsequent review is another key area. Efforts are 
ongoing with Digital s Campus-based Engineering 
Center in Vienna to determine the benefits of using 
fractal analysis on airborne particle data. 



Summary 

The success or failure of a semiconductor produc- 
tion facility lies in the ability to control contam- 
ination and reduce defects. Efforts are under way 
to improve Digital s ability to obtain high yields 
on current and future microprocessors. Semi- 
conductor companies must join with inspection 
and analytical equipment manufacturers, as well as 
with research institutes, to develop the required 
tools to support future technologies. 
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A Yield Enhancement Methodology 
for Custom VLSI Manufacturing 

Integrated circuit yield enhancement is a complex issue due to the many steps 
involved in the manufacturing process and the number of variables governing the 
over all yield. The task is further compounded by industry technology goals for con- 
tinually improving performance achieved by decreasing minimum feature size, 
increasing chip area, and incorporating more on-chip functionality from genera- 
tion to generation. In the final analysis, the cost of producing chips is directly related 
to the yield, hence the necessity for a comprehensive yield improvement strategy. 



The industry technology goal for continually 
improving complementary metal-oxide semi- 
conductor (CMOS) very large-scale integration 
(VLSI) chip performance has been achieved by 
decreasing minimum feature size and incorporating 
more on-chip functionality in a larger chip area. 
At Digital, four generations of CMOS technology 
have been developed. Each generation possesses 
successively more complex manufacturing pro- 
cesses as well as more individual process steps to 
fabricate the chips. These complex processes and 
additional steps have increased the number of vari- 
ables that have the potential for affecting yield. 

Digital's Alpha 21064 microprocessor chip 1 has a 
peak operating frequency of 200 megahertz (MHz), 
contains 1.7 million transistors, and has chip dimen- 
sions 1.68 X 1.39 square centimeters (cm 2 ). The 
Alpha microprocessor is the highest performance, 
highest density, and largest chip currently manufac- 
tured in volume by Digital. With these chip com- 
plexities, achieving chip yield goals is imperative 
for successful cost-effective manufacturing. 

In the manufacture of integrated circuits, yield is 
denned as the fraction of the total number of die 
sites introduced into processing that are completed 
as fully functional chips. The cost of producing 
a fully functional chip is inversely proportional 
to the yield. In the CMOS-4 manufacturing line, 
22 wafers are processed through all the steps as a 
single lot. On a lot basis, 

Cost = Cost of lot 

of chip Yield of lot • Total number of die sites in Jot 

Hence, improving the yield directly affects reduc- 
ing the cost of production per chip. This motiva- 



tion for yield enhancement applies to all product 
chips for the life of the respective chip. 

Once the product chip is introduced, demand 
for the chip increases up to some level and then 
declines to the end of its life. During a product's 
development, circuit and system design teams 
require quantities of chip prototypes for design veri- 
fication and debug purposes. The rapid increase in 
demand after product introduction requires a steep 
yield learning curve and hence rapid yield enhance- 
ment to supply production quantities. 

This paper discusses the yield enhancement 
methodology used to evaluate processing, process 
equipment, manufacturing methods, design, and 
testing in relation to yield. It describes the yield test 
vehicles, designed experiments, yield models, and 
special-purpose analytical tools to identify and pri- 
oritize defects and to focus resources on appropri- 
ate defect inspection and failure analyses tasks. 
Before our discussion of the specific techniques, 
we present a brief overview of the methodology 
to enhance yield. 

Overview of Yield Methodology 

The yield enhancement methodology applied to 
chip manufacturing at Digital involves the use of 
test chip data, product chip data, static random- 
access memory (SRAM) data, yield models, special- 
purpose analytical tools, and defect inspection to 
perform defect identification and prioritization. 
The information gained is relayed to process engi- 
neering for rapid yield learning and the design of 
experiments for yield improvement. In addition, 
yield, development, and design teams use the 
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information to estimate the manufacturability of 
future products. 

Figure 1 presents the overall methodology and 
the relationships of the various stages of yield anal- 
ysis. After a chip has been designed, its design is 
converted into a chip layout to produce the reticle 
set tor the chip, 'lest chip wafers from process man- 
ufacturing are electrically tested. Parametric data, 
defect structure data, and SHAM test vehicle func- 
tionality data are measured on test chips. The lay- 
out information for the chip, the measurements 
from defect structures on the test chip, and SRAM 
functional yield are used to model yield. 

The yield model generates a priority list (defect 
pareto) of the yield-limiting delect mechanisms for 
the layout of that particular chip. If the layout of 
the chip is determined to be especially sensitive to 
a particular defect type, the chip layout can be 
altered to optimize for higher yield without chang- 
ing the design functionality. In addition, in the early 
stages of design, yield information from a previous 
generation technology can be used to forecast the 
yield of chips to be manufactured in a succeeding 
generation of technology. The effects of choosing 
different redundancy schemes can also be forecast. 



Product wafers from process engineering are 
also electrically tested, and parametric data and 
product functionality data are measured on these 
wafers. The results of measurements at electrical 
test, the lot process records, and defect inspection 
data undergo parametric and product yield analyses 
to determine the factors affecting yield. All results 
of analyses are used by process engineers for exper- 
imental design and yield learning. The use of test 
chips is one of the starting points in early process 
defect learning and is described next. 

Test Chip 

Test vehicles and structures on the test chip pro- 
vide the process and electrical information needed 
to estimate chip yield. The CMOS-4 yield test chip is 
shown in Figure 2. The chip is divided into three 
functional areas: 

1. Defect density test structures 

2. 128 kilobit (Kb) SRAM 

3. Scribe lane structures 

The defect density structures are used to determine 
the component defect densities for the integrated 
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Figure 1 Yield Methodology Flow Diagram 
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Figure 2 Yield Test Chip for the CMOS-4 Process 



process, and the scribe lane structures are used to 
determine the parametric yield. Since the SRAM and 
the defect density structures are both on the chip, 
the SRAM failure modes can be correlated with the 
component defect levels measured on the defect 
density structures. In addition, with the compo- 
nent defect densities factored into the yield model, 
a defect pareto can be generated that prioritizes the 
defect mechanisms according to their yield-limiting 
effect on the SRAM. 

The defect density structures have the following 
qualities: 

1. Simple design, with large area structures for 
determining specific component defect densities 

2. Testable in-line 

3. Compatible with the yield model 

4. Inspectable in-line 

The CMOS-4 technology integrates approximately 
250 process modules on a single complex chip. 1 
If a process aberration or defect renders a chip 
inoperable, it is very difficult to diagnose at which 
process step the defect occurred. 

The defect structures partition the integrated 
process into critical physical process features that 
are more easily characterized than an entire com- 
plex chip. A number of defect structures are 
designed; each one is designed to detect a specific 



defect type. The complete set of defect structures 
combines to represent the ful 1 process. 

The defect structures test for the following gen- 
eral electrical faults: 

1. Intralayer short circuits and open circuits 

2. Interlayer short circuits 

3. Contact/via chain open circuits 

Intralayer short circuits and open circuits are mea- 
sured using snake/comb structures. For example, 
Figure 3 shows a schematic diagram of a metal 1 
(Ml) snake/comb structure that tests for Ml short 
circuits and open circuits. Testing the continuity of 
the snake detects open circuits in the Ml line. 
Testing for continuity between the snake and 
combs detects Ml short circuits. 

Figure 4 shows an example of an interlayer short- 
circuit test using an Ml to metal 2 (M2) capacitor. 
A test for continuity between Ml and M2 conductor 
plates detects short circuits in the dielectric layer 
between Ml and M2. 

Contact/via integrity is tested using chains of 
contacts or vias. Figure 5 shows how an M2 to Ml 
via chain is tested for continuity to check for any 
chain open circuits. 

Table 1 lists the yield test chip defect structures 
and respective electrical faults or defect mecha- 
nisms that they detect. 

The test structures must be testable in-line, that 
is, they must allow electrical tests to be performed 
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METAL 1 



METAL 1 



METAL 1 



METAL 2 



METAL 1 



(a) Top View 



METAL 2 



Figure 3 Metal 1 Snake/Comb Structure 

after Ml 5 M2, or metal 3 (M3) patterning. Conse- 
quently, structures that do not require process 
steps after Ml must be designed only in those layers 
up to and including Ml. For example, Ml to poly- 
silicon contact chains are connected to pads by Ml 
and not by Ml to M2 vias in conjunction with M2. 
The former chain is testable in-line after Ml pattern- 
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(b) Side View 



Figure 4 Metal 2 over Metal 1 Capacitor 
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Figure 5 Metal 2 tm Metal 1 Via Chain 



86 



Vol, 4 No. 2 Spring J992 Digital Technical Journal 



A Yield Enhancement Methodology for Custom VLSI Manufacturing 



Electrical Fault to Detect 



Gate oxide, dielectrics 1 ,2,3 short circuits 
Gate oxide, dielectrics 1 ,2,3 short circuits 
Field/active area over periphery short circuits 

Field/active area periphery short circuits 

M1 to n+ open circuits 

M1 to p+ open circuits 

M1 to n+ polysilicon open circuits 

M1 to p+ polysilicon open circuits 

M2 to M1 open circuits 

M3 to M2 open circuits 

Polysilicon to local interconnect to n+ open circuits 

Polysilicon open circuits and short circuits 

M1 open circuits and short circuits 

M2 open circuits and short circuits 

M3 open circuits and short circuits 

N+ open circuits and short circuits 

P+ open circuits and short circuits 

Local interconnect to n+ short circuits 

N-gate meander short circuits 

P-gate meander short circuits 

Local interconnect open circuits and short circuits 

Local interconnect to n+ polysilicon short circuits 



Table 1 Defect Structures and Defect Types 
Structure 

Dielectrics 

M3/M2/M1 /polysilicon/substrate capacitor 

M3/M2/M1 /poly silicon/well capacitor stack 

Bird's beak structure — polysilicon plate 
minimum pitch active area stripes in well 

Bird's beak structure — polysilicon plate over 
minimum pitch active area stripes in substrate 

Contacts 

M1 to n+ contact chain 

M1 to p+ contact chain 

M1 to n+ polysilicon contact chain 

M1 to p+ polysilicon contact chain 

M2 to M1 contact chain 

M3 to M2 contact chain 

Polysilicon to local interconnect to n+ chain 
Interconnect 

Polysilicon snakes and combs 

M1 snakes and combs 

M2 snakes and combs 

M3 snakes and combs 

N+ snakes and combs 

P+ snakes and combs 

Local interconnect to n+ combs 

N-channel gate meander 

P-channel gate meander 

Local interconnect snakes and combs 

Local interconnect to n+ polysilicon combs 



ing; the latter has to continue processing to be 
tested after M2 patterning. 

The structures were designed with minimum 
pitch metal lines and minimum surrounds for con- 
tacts to give them the greatest sensitivity to defects. 
In addition, the structures covered an area large 
enough to detect the minimum defect density 
desired. The yield model required that the yield test 
chip contain four or more instances of each type of 
defect structure to determine the clustering param- 
eter in the yield model. 

The structures are integrated onto the chip to 
facilitate manual or automated visual inspection of 
failing test sites. For example, in the process flow, 
Ml is patterned several process steps after poly- 
silicon is patterned. To keep the polysilicon layer In 
view, some polysilicon snake/comb structures 
must not be covered with any Ml structures. 
Should a failing polysilicon comb be found at Ml 



electrical test, that failing comb can then be visu- 
ally inspected to identify the type of defect causing 
the comb to fail. 

The scribe lane contains the minimum set of 
electrical test and process monitor structures 
required to characterize and monitor the process in 
a manufacturing mode. The scribe lane exists on all 
chips. Table 2 lists parameters measured from the 
scribe structures. If a critical parameter does not 
comply with its specification on more than a cer- 
tain number of die sites on a wafer, that wafer is 
rejected. Hence, scribe lane structures determine 
parametric yield loss. 

Electrical Failure Specifications 
To estimate the defect density from an electrical 
test structure monitor, specification limits must 
be established that determine a fault. Usually, the 
DC parametric testing applies either a voltage or a 
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Table 2 Scribe Lane Electrical Test 
Parameters 

Transistor threshold voltages 
Transistor saturation region currents 
Transistor leakage currents 
Transistor effective channel lengths 
Diode breakdown voltages 
Field transistor threshold voltages 
Interconnect sheet resistances 
Gate oxide thickness 
Contact resistances 
Dielectric leakage currents 



current value. Either the leakage current or volt- 
age is then recorded or the resistance is computed 
on metal interconnect vias, on contact chains, or on 
serpentine lines. A fault at a gate capacitor is 
recorded if the level of current passed after a volt- 
age ramp is 1 microampere QiA) or more. The fault- 
level specifications are periodically reviewed and 
changed as necessary. 



SRAM Analysis for Process Fault Signatures 
The SRAM is a useful circuit vehicle for yield analy- 
sis. On the yield test chip, the probe yield of the 
SRAM measures the capability of the full, integrated 
process to yield product. The regu lar array of the 
SRAM and its bit mapping capability allow some cor- 
relation of failure modes to defect mechanisms. The 
use of SRAMs processed with the defect test struc- 
tures allows the determination of process defects 
that affect circuit failures. A relatively large SRAM 
array (i.e., 128Kb or larger) typically captures most 
of the faults within the memory cells, which com- 
prise up to 90 percent of the memory chip layout. 
These cells are tested after the basic continuity and 
leakage requirements have been met during the 
SRAM wafer level testing. Since the memory cell lay- 
out is a regular array, certain defect mechanisms 
have specific functional failure patterns within the 
circuit. Figure 6 is a typical bit fail map showing 
the types of patterns analyzed with pattern recogni- 
tion techniques.- These patterns were analyzed for 
their signature of possible defect mechanism with 
a probability of failure (POF) matrix. * 
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Table 3 gives the POF matrix for a 128Kb SRAM. 
The rows of the matrix are the individual defect 
types monitored in the process using the special 
test structures described above. The columns of the 
matrix are the types of SRAM failure patterns gath- 
ered from pattern recognition programs. Each cell 
in the matrix is the average probability that a defect 
of that type wil 1 cause a circuit fault type. This anal- 
ysis was originally done using Monte Carlo simula- 
tion techniques on the memory cell using the 
VLASIC yield simulator. 45 This matrix is currently 
being used to compile defect statistics from the 
defect-sensitive test structures together with the 
SRAM pattern fail data. 

An example of SRAM analysis shows the yield and 
failure diagnosis of column failures. Since the SRAM 
columns are designed in the M2 interconnect layer, 
the level of M2 to M2 short circuits measured from 
the defect test monitors correlates to the level of 
column failures. Figure 7 shows the level of single 
and double column failures and the level of lateral 
M2 short circuits on a per lot basis. SRAM analysis is 
used because the defects accumulated during the 
manufacturing process do not have a strong electri- 
cal fault signature on a large, complex microproces- 
sor circuit. 

The yields of the SRAM chips are typically com- 
pared to those forecast with the yield model. The 
test structures described earlier are used collec- 
tively in the yield model, which is discussed in the 
following section. 
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Figure 7 Comparison of SRAM Single and 

Double Column Failures to Metal 2 
Short Circuits 



Single Layer Yield Model 

The single layer yield model identifies the defect 
types that have the greatest impact on yield. Proc- 
ess engineering uses this information, known as the 
defect pareto, to improve the yield by designing 
experiments to reduce the defect density of the 
highest priority defects. Design engineering uses 
this information to lay out product chips so that 
they are less sensitive to these defect types. 



Table 3 128Kb SRAM Probability of Failure Matrix 

I Fault Type 







Partial 


Double 


Single 


Partial 




Row 


Row 


Column 


Column 


Column 


Defect Type 


Fails 


Fails 


Fails 


Fails 


Fails 


Dielectric 1 (flat) 


0.20000 


0.0000 


0.00000 


0.00000 


0.00000 


Dielectric 2 (worst-case step) 


0.00000 


0.0000 


0.00000 


0.95000 


0.00000 


Dielectric 2 (flat) 


0.00000 


0.0000 


0.00000 


0.05000 


0.00000 


Polysilicon short circuit 


0.15640 


0.0242 


0.02160 


0.18315 


0.00000 


M1 short circuit 


0.59000 


0.0000 


0.17410 


0.20453 


0.00000 


M2 short circuit 


0.00000 


0.0000 


0.73221 


0.22870 


0.00000 


Polysilicon open circuit 


0.19602 


0.0681 


0.00000 


0.00000 


0.00000 


M1 open circuit 


0.36310 


0.6020 


0.00000 


0.00000 


0.00000 


M2 open circuit 


0.00000 


0.0000 


0.12100 


0.21950 


0.63900 


N-gate area 


0.00000 


0.0000 


0.00000 


0.00000 


0.00000 


P-gate area 


0.00000 


0.0000 


0.00000 


0.00000 


0.00000 
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The single layer yield model uses the negative 
binomial yield equation (1) to predict the yield 
impact on a given product for each defect type. 678 

yield. = (14- A c D 0 /a^- a » (1) 

where A c is the critical area of the chip, D o is defect 
density, a. is a measure of spatial distribution of 
defects, 9 and i is an index used to denote a specific 
defect type. 

The critical area of the chip is the area that is 
most susceptible to a particular type of defect. 
Critical area can be measured in various units, 
including square centimeters, meters, and numbers 
of contacts. The defect density is the number of 
defects per unit area where the units of area are the 
same as the critical area. Defect density is indepen- 
dent of product type, and depends only on the 
cleanliness of the fabrication. The spatial distribu- 
tion of defects is independent of product type 
(depending on chip size 10 ) and depends to some 
degree on the cleanliness of the fabrication. 11 

Critical Area . Extractor 

Integrated circuits are designed using computer- 
aided design (CAD) technology The geometries that 
eventually become the wires, transistors, resistors, 
and other circuit elements are stored in a layout art- 
work file. Design Rule Check software calculates 
critical areas for different defect types. The algo- 
rithms for this usually involve Boolean algebraic 
operations and sizing of layout geometries. 

One example of a critical area extraction is to 
deline a temporary layer to be the intersection 
of the Ml, polysilicon, and Ml contact layers. The 
number of geometries on this temporary layer is the 
total number of Ml to polysilicon contacts on the 
product chip. The critical area of the Ml to poly- 
silicon open-circuit defect type is related to this 
number (there are additional steps to the algorithm 
that eliminate counting redundant contacts). Other 
algorithms are used to calculate the critical areas 
for the other defect types. 

Table 4 gives the units of critical area for the four 
basic defect types. These area extractions are made 
on both the test pattern and the product chip and 
are used in the negative binomial yield equation as 
the critical areas. 

Computation of Alpha . 

The negative binomial yield model requires the cal- 
culation of a parameter that is usually referred to as 
a. As stated previously, a i is a measure of the spatial 



Table 4 Critical Areas by Defect Type 


Defect Type 


Units of Critical Area 


Interlayer open 


— . 

Number of nonredundant 


circuits (contact 


contacts 


chains) 




Interlayer short 


Area of overlap between 


circuits (caps) 


sequential interconnecting 




layers 


Intralayer open 


Hundreds of meters of 


circuits (snakes) 


minimum width 




interconnect 


Intralayer short 


Hundreds of meters of 


circuits (combs) 


minimum spacing 




interconnect 



distribution of defects on a wafer 9 Small values of a 
indicate that defects are more likely to be clustered 
in isolated areas on the wafer. High values of a indi- 
cate that defects occur across the wafer in a more 
uniform fashion. A single test pattern has multiple 
copies of the same test structure on it. Each of these 
test structures is independently testable so that the 
number of failing test structures on a test pattern 
can be counted. From this data, a distribution can 
be created of number of occurrences as a function 
of number of failing test structures. The mean and 
variance of this distribution can be computed and 
then a can be calculated from the following equa- 
tion (2). u 

a i = m?/(v i - m.) (2) 

where m. is the average number of failing test struc- 
tures per test pattern, v i is the variance of failing 
test structure per test pattern, and i is an index indi- 
cating the particular defect type. 

Defect Density . Calculation 
The binomial yield equation can be solved for 
defect density, if the yield p a., and critical area,, 
are known. Therefore, the defect density can be 
calculated for the test pattern because the yield . 
and a. are measured directly at electrical test, and 
the critical area,, is calculated from the area extrac- 
tion software. 

Product Yield / Calculation 
Defect density, and a t are assumed to be the same 
for all products and test patterns. This assumption 
states that the amount and the spatial distribution 
of defects are not product dependent. Their values 
are established from the electrical testing of the test 
pattern. The critical area, is product specific and 
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determines how defect density,, and a f affect the 
yield for that particular product. The critical area, 
for the product is extracted from the layout art- 
work file. These three values are then inserted into 
the negative binomial yield equation, and the yield, 
of the product is solved. 

Defect Pareto 

After all the values for yield, have been calculated, 
they are listed in order of increasing yield to create 
the defect pareto. An example of a defect pareto 



from which the values for yield . have been removed 
is given in Table 5. Information from the defect 
pareto is relayed to process and design engineering 
to complete the process enhancement and design 
for manufacturability 

Composite Layer Yield Model 
The modeled yields of each defect type in the single 
layer yield model are assumed to be independent of 
each other. The product of the modeled yields rep- 
resents the overall modeled yield. 



Table 5 Typical Yield and Defect Pareto 



Defect 


D 0 


Units 


a 


Estim 
Yield 










O.xxx 


M2 short-circuited lines 


1 1 .350 


100 m 


0.108 


O.xxx 


M1/polysilicon contact open circuits 


3.397 


ppm 


.490 


O.xxx 


M1 short-circuited lines 


4.949 


100 m 


0.150 


O.xxx 


Local interconnect-polysilicon 
short-circuited lines 


5.376 


100 m 


0.059 


O.xxx 


M2 open lines 


0.941 


100 m 


1.000 


O.xxx 


Active area p+ short circuits 


4.804 


100 m 


0.098 


O.xxx 


Active area n+ short circuits 


4.700 


100 m 


0.080 


O.xxx 


M2/M1 contact open circuits 


0.266 


nnm 
Hr" 1 1 


0.061 


O.xxx 


M1/n+ contact open circuits 


0.113 


ppm 


0.056 


O.xxx 


Polysilicon short-circuited lines 


1.739 


100 m 


0.035 


O.xxx 


Polysilicon open lines 


2.610 


100 m 


0.027 


O.xxx 


M1/p+ contact open circuits 


0.117 


ppm 


0.034 


O.xxx 


Active area n+ open circuits 


0.831 


100 m 


0.017 


O.xxx 


Local interconnect-n-i- active area 
short-circuited lines 


1.674 


100 m 


0.101 


O.xxx 


Active area p+ open circuits 


0.458 


100 m 


1.000 


O.xxx 


Dielectric 2 (capacitor) 


0.316 


cm 2 


0.045 


O.xxx 


Local interconnect short-circuited lines 


2.110 


100 m 


0.109 


O.xxx 


Dielectric 3 (capacitor) 


0.260 


cm 2 


0.030 


O.xxx 


Dielectric 1 (flat) 


0.047 


cm 2 


1.000 


O.xxx 


Local interconnect open lines 


0.397 


100 m 


1.000 


O.xxx 


M3/M2 contact open circuits 


0.201 


ppm 


0.027 


O.xxx 


Local interconnect-polysilicon-active 
area contact open circuits 


0.007 


ppm 


0.004 


O.xxx 


M3 short-circuited lines 


3.596 


100 m 


0.146 


O.xxx 


M1 open lines 


0.138 


100 m 


0.003 


O.xxx 


Dielectric 2 (worst-case step) 


0.661 


cm 2 


0.055 


O.xxx 


M3 open lines 


2.338 


100 m 


0.013 


O.xxx 



Notes: 

1 00 m = defects per 1 00 meters of length 
ppm = parts per million defective 
cm 2 = defects per square centimeter 
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yield = product of yield . tor all values of i 
where i is an index describing each defect type. 

Product Chip Yield Analysis 

The continuous production of high-yield product 
wafers requires understanding the impact of 
design, manufacturing, processing, process equip- 
ment, and testing on yield. Root cause analysis of 
a low-yield wafer with 54 chips, each containing 
1.7 million transistors, capable of operating at 
200 MHZ, is a difficult task. The following section 
describes the testing methodology applied by the 
product yield enhancement engineer. 

Alter the wafers have completed the fabrication 
process, all die are electrically tested. The electri- 
cal test code starts with simple continuity checks 
(open circuits and short circuits) on small areas 
of the die. Tests that require limited functionality are 
performed early in the test sequence. Electrical tests 
incrementally cover a larger area and more function- 
ality. First-pin fail identity and parameter value (volt- 
age, current, or test vector) are retained for each die. 

The code written in the test sequence allows the 
yield engineer to determine the cause of the failure. 
For example, high current failures (short circuits) 
can often be correlated to metal short circuits 
caused by inadequate metal etch, poor planariza- 
tion, or particle deposition. Electrical testing that 
can identify a specific area within a die as the prob- 
able failure site aids in analysis of the failure. If pin 1 
is short circuited to ground (T^ s ), it is probable that 
the cause of the failure is in close proximity to 
pin 1. Visual inspection and the scanning electron 
microscope (SEM) are often employed for this type 
of analysis. If no cause is found, more intensive fail- 
ure analysis is pursued. 

Functional testing in a production environment 
limits the amount of data that can be stored. The 
testing often stops at the first test failed. It is there- 
fore important that initial functional tests require 
only minimal functionality. Stored data that iden- 
tifies the failing pin and test vector provides the 
ability to perform commonality studies on manu- 
facturing data. If a failure mode can be isolated, 
analysis is simplified. 

Tests that are similar or specific to an area are 
grouped in bins. For example, a die that fails 
because of a short-circuited pad is collected in a bin 
labeled "CS"; a die that fails a vector lor floating 
point is stored in a bin labeled "F'ROX" (functional 
failure in the module where the floating point is 
processed); fully functional die are stored in bins 



labeled "$$.'" These bins can be analyzed for trends 
and commonalities. 

The probe failure bins can be analyzed by several 
methods. Composite wafer mapping is useful for 
describing the failure pattern. A composite map 
graphically displays a probe bin for the entire lot as 
a percent of die failing a certain type of test. 
Composite maps can quickly show if lots have com- 
mon causes of failure. For example, after extensive 
analysis, the cause for a certain failure pattern can 
be correlated to a certain process operation. 
Analysis of composite maps from other lots quickly 
reveals if they were affected by a similar process 
operation. An example of a composite map for sili- 
cide over growth is shown in Figure 8. 

BINCODES: CS 



123456789 



1 






0 


13 


0 


0 


0 






2 




0 


13 


0 


13 


0 


13 


0 




3 


38 


13 


0 


0 


100 


100 


0 


0 


13 


4 


0 


0 


13 


88 


88 


88 


88 


0 


0 


5 


13 


0 


13 


100 


100 


88 


88 


0 


0 


6 




0 


0 


75 


75 


75 


0 


0 




7 






0 


13 


50 


50 


13 







Figure 8 Composite Wafer Map for Silicide 
over Growth on CMOS- 4 Process 

Another method to correlate probe results to a 
process change is to plot probe data on a cumula- 
tive sum control chart. Typically, probe data is plot- 
ted as a response to the sequence of a lot (wafers 
are processed in lots of 22 each) through fabrica- 
tion processes. The effectiveness of employing 
these control charts relies on two factors: the 
sequencing of lots must be randomized from one 
fabrication process to the next, and process 
changes must be meticulously recorded. When a 
slope inflection point correlates to the date of 
a process change, a highly probable cause for 
change in probe data can be identified, as shown in 
Figure 9. If two or more systems are used for a par- 
ticular process interchangeably, a control chart for 
each system can be generated. For example, an alu- 
minum etcher with low etch rate may consistently 
cause more CS probe failures than another alu- 
minum etcher with high etch rate. 
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LOTS IN PROCESS SEQUENCE 



PLANARIZATION IMPROVEMENT 
DIELECTRIC LAYERS 2 AND 3 CHANGED 
DIELECTRIC LAYER 1 THICKNESS INCREASED 
METAL 2 PARTICLE LEVEL INCREASED 



Figure 9 CMOS -4 Yield Cumulative Sum Control Chart 



Recording lot process history is crucial in per- 
forming probe yield analysis. As a CMOS-4 wafer is 
processed, hundreds of parameters are recorded. 
All of these process parameters must be easily 
accessible to software tools for analysis. Equally 
important, all the process parameters must be 
under statistical process control. Poor probe yields 
with occasional high-yielding lots indicate a poorly 
controlled process. Probe analysis of a process with 
wide variability often reveals several causes and 
pertains only to a single lot. Probe analysis of a 
process under statistical control normally identi- 
fies limited causes and represents the majority of 
the lots. 

Commonality studies to investigate process- 
related causes for differences between high- and 
low-yielding lots are often performed. Software 
tools extract all process-related parameters con- 
cerning the two groups of lots. Each group is ana- 
lyzed for a common parameter that is different from 
the other group. Commonality studies that identify 
processing differences between high- and low-yield 
lots are often confirmed by experimental analysis. 



The CMOS-4 process was debugged and qualified 
using a 128Kb SRA M. As discussed earlier, the SRAM 
is designed to offer increased analysis capability 
over custom-designed CPUs. As the process con- 
verts to product, actual product yields may differ 
from projected product yields. The yield engineer 
needs to understand the similarities and differences 
among the chips. The SRAM yield determined by the 
lower level processes (under Ml) is very similar 
to the Alpha 21064 chip yield because large areas 
of the 21064 chip are designed the same as the 
SRAM. The upper layers of the 21064 chip can devi- 
ate in layout from the SRAM and can respond differ- 
ently to variations in the process. 

Due to the circuitry of the CMOS-4 die, foreign 
particle control and monitoring are critical. Nearly 
every process step deposits some particles. Particle 
size and type are important parameters to correlate 
to yield. These parameters are correlated with the 
aid of the defect density structures. The final and 
most significant correlation must be done to prod- 
uct yield. Visual inspection and characterization 
of particles on failing die compose the first-order 
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analysis. The second order of analysis is to correlate 
particle size and distribution (measured with auto- 
mated laser inspection tools) to yield. This analysis 
defines the type of particles and the particle size- 
that will impact yield most significantly. Further- 
more, it uncovers the sources of the defects that 
most contribute to yield loss. Yield engineers can 
then prioritize detect reduction. 

Yield Forecasting 

The yield model is often used during feasibility 
studies to forecast the yield of a planned chip prior 
to its design. Typically, the process of predicting 
the yield of a new planned integrated circuit chip 
starts by examining the basic layout of a chip. As 
shown in Figure 10, the structure of a chip is parti- 
tioned into functional subblocks. By understanding 
how much a subblock will change from its previous 
use, yield engineers can estimate changes to the 
new subblock. Frequently, a subblock will not 
change enough to cause its yield to be significantly 
different from the layout used in a previous chip. 
These subblocks are available in a library of artwork 
layouts kept in the form of their extracted critical or 
susceptible areas. 15 



CPU 


CLOCK 

CE 
FL 
AC 


NTRAL 

oating-point 
;celerator 


CACHE CONTROL 








DATA ( 


:ache 








CACHE TAG 



Figure 10 Simplified Chip Layout 

In the next step, the critical areas estimated from 
each subblock are used to obtain an estimate for the 
entire chip. Cache memories are usually added to 
the random logic areas by taking critical areas of 
memory support circuitry and adding the multipli- 
cation of the cell areas to the number of total bits 
required in a cache array. Once the total critical area 
estimates are complete, the defect density goals 



and targets for each layer are used to project a yield 
estimate for the foreseeable life of the product. 

If the subblock is new, an estimate can often 
be made by understanding the type of logic or cir- 
cuitry being considered. If the circuitry is pure ran- 
dom logic, buses, or memory-like (e.g., in a cache 
controller), then the critical area estimate of that 
subblock will assume the artwork properties of 
these circuits. This scenario may appear overly 
complex, however, when the chip being estimated 
is 2 to 3 cm 2 in footprint area and is tightly packed 
with minimum ground rule artwork, this complex 
procedure is necessary for a reasonable yield esti- 
mate. A reasonable estimate is considered to be 
within ±20 percent of the actual yield; if one used 
more simplified approaches, errors up to 300 per- 
cent could easily occur. 

Figure 11 compares the forecasted yield and the 
actual yield for the 128Kb SRAM and the Alpha 21064 
microprocessor. The vertical axis is the forecasted 
chip yield normalized to a relative scale. The open 
circle is the 128Kb SRAM chip forecast, and the 
closed square is the actual SRAM yield. The Alpha 
model yield is plotted with a closed circle, and the 
Alpha actual yield is plotted with an open square. 
The estimates were made approximately 4 to 6 
months prior to the product chip being prototyped 
in the fabrication facility. The projected estimates 




DATE 

KEY: 

G— G 128Kb SRAM FORECAST 
■ ACTUAL 1 28Kb SRAM 
• ALPHA MODEL YIELD 

D— D ACTUAL ALPHA YIELD 



Figure 11 CMOS -4 Yield Forecast Projections 
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are relatively close to the actual chip yields during 
the same manufacturing time frame. 

Redundancy Yield Models of Caches 
and Memory Chips 

Relatively large cache RAMs are being used today on 
microprocessor chips to increase the processor's 
bandwidth performance. The bit capacity typically 
ranges from 100Kb to 1.5 megabits. This cache 
memory size can use up to 50 percent of the entire 
footprint area of a microprocessor chip. Because 
of these dimensions and bit capacities, on board 
spares are sometimes used to provide fault toler- 
ance for the processor's cache memory. If the num- 
ber of repairable faults is less than or equal to the 
number of available spares, the chip can be repaired 
to a fully functioning device. 

Redundancy Yield Model The redundancy yield 
model consists of two sets of parameters. One 
set characterizes the process and the other set 
describes the product. The set of defect densities 
modeled in an older generation process is shown 
in Table 6; this set of parameters characterizes the 
random defects in the process. These defect densi- 
ties can be expanded in mathematical form to 

A=A,-D (3) 

where A is a vector representing the four fault types 
of memory pattern bit failures as given below. 

1. Single bits 

2. Single word lines 

3. Single bit lines 

4. Chip kill 

These fault patterns are the parameters describ- 
ing the random faults on the memory portion of the 
product. A c , is the critical area matrix of the prod- 
uct, and D is the vector of defect densities at all 
modeled layers. The critical area matrix relates the 
defect densities modeled in the process to the 
cache memory circuit fault patterns of the product 
as given in Table 7. These arrays of numbers repre- 
sent the sensitivity of the cache memory circuit 
to random defects. The sensitivity to defects is 
obtained by calculating the critical area as 
described in the yield model section of this paper. 

The probability of failure for a given size defect 
is the fraction of defects of that size which has been 
determined to have caused a fault. These probabili- 



Table 6 Defect Density Types 



D 0 Parameter Units 

Dielectric 1 (flat) cm 2 

Dielectric 2 (worst-case step) cm 2 

Dielectric 2 (flat) cm 2 

Polysilicon short circuit 100 m 

M1 short circuit 100 m 

M2 short circuit 100 m 

Polysilicon open circuit 100 m 

M1 open circuit 100 m 

M2 open circuit 100 m 

N-gate area cm 2 

P-gate area cm 2 

Gate meander 100 m 

Gate bird's beak 100 m 

M2/M1 contact ppm 

M1 -polysilicon contact ppm 

M1-n+ contact ppm 

M1-p+ contact ppm 

Active area short circuits n+ 100 m 

Active area open circuits n+ 100 m 

Active area short circuits p+ 1 00 m 

Active area open circuits p+ 100 m 

Dielectric 3 (cap) cm 2 

Dielectric 3 (worst-case step) cm 2 

M3/M2 contact ppm 

M3 open circuit 100 m 

M3 short circuit 100 m 

Notes: 



1 00 m = defects per 1 00 meters of length 
ppm = parts per million defective 
cm 2 = defects per square centimeter 



ties of failure have been analyzed using Monte Carlo 
simulation techniques using the VLASIC yield simu- 
lator as described by Walker. 3 By establishing 
libraries of critical areas for different circuits, a 
cache memory can be simulated to the number of 
total bits required by the design. The random logic 
critical areas are thus lumped into the chip-kill 
category as seen in Table 7. In this fashion, the mean 
number of fails per chip for each circuit fault type 
can be computed. In many circumstances, the 
mean number of fails per chip can be obtained 
from existing memories that are similar in design 
to the future cache memory. Fault statistics can 
then be collected, and the failure distribution 
can be modeled by using the negative binomial 
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Table 7 Critical Area Matrix for Processor Chip with Cache Memory 



Defect Type 



I "Fault Type - 

Single Single Single Chip 

Cells Word Lines Bit Lines Kill 



Dielectric 1 

Dielectric 2 (worst-case step) 

Dielectric 2 (flat) 

Polysilicon short circuit 

M1 short circuit 

M2 short circuit 

Polysilicon open circuit 

M1 open circuit 

M2 open circuit 

N-gate area 

P-gate area 

Gate meander 

Gate bird's beak 

M2/M1 contact 

M1- polysilicon contact 

M1-n+ contact 

M1- p+ contact 

Active area short circuits n+ 

Active area open circuits n+ 

Active area short circuits p+ 

Active area open circuits p+ 

Dielectric 3 (cap) 

Dielectric 3 (worst-case step) 

M3/M2 contact 

M3 open circuit 

M3 short circuit 



0.01 6560 
0.088800 

0.01 6060 
0.052460 
0.060980 
0.048640 
0.065060 
0.065600 

0.014060 
0.037340 
0.009220 
0.230700 
0.296600 
0.426600 
0.218400 
0.002733 
0.002730 
0.012420 
0.012420 



0.000050 0.002430 



0.000304 



0.000374 



0.000149 



0.0001 75 



0.000081 
0.000081 



0.020250 
0.141500 
0.018167 
0.033670 
0.170150 
0.157040 
0.114590 
0.89041 5 
0.116670 

0.023650 
0.026550 
0.002770 
0.108990 
0.082260 
0.227200 
0.142300 
0.012060 
0.012060 
0.077647 
0.077674 

0.003353 
0.021 534 
0.000250 
0.000335 



distribution with A and a as parameters. 10 Kl 17 This 
is accomplished through use of the probability 
model given as 

I\x + a) (A /ay (4) 

P{X = • ; w 

x\I (a) ( 1 + A/a> v *" 

A nonlinear least -squares technique is used to fit 
the parameters to the observed distribution; these 
parameters can be used instead of defect densities if 
desired. 17 This alternative can give the model more 
flexibility, depending on which data is most appro- 
priate to use for an estimate. Examples of the fitted 
distributions of single cell failures, column failures, 
and double column failures are given in Table 8. 

The estimates of chip yield using different combi- 
nations of redundant spares and turning off banks 
of the cache memory have also been used in the 



past. Figure 12 shows the yield as a function of 
redundant spares and sets of banks where the 
total set is eight and the desired number of good 
banks is at least six out of eight. These computa- 
tions are again performed using a model described 
by Stapper. 10 1718 The yields were estimated approxi- 
mately one year before actual manufacturing data 
existed for that product. These types of analyses are 
part of the feasibility studies that help the design 
engineers determine the optimum product yields. 
This model has been expanded and modified to 
include clustering of faults within a chip when a 
chip becomes rather large. 18 

Design for Manuf act ur ability 

The yield equation (1) shows that yield is a function 

of both critical area and defect density. Since criti- 
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Table 8 Fitted Distributions of SRAM 
Functional Failures 



Faults 



nor f^hin 




Prpriipfpri 


Residuals 


0 


0.626866 


0.629694 


-0.002829 


1 


0.199627 


0.200209 


-0.000582 


2 


0.082090 


0.087091 


-0.005001 


3 


0.033582 


0.041282 


-0.007700 


4 


0.016791 


0.020374 


-0.003583 


5 


0.013060 


0.01 0293 


0.002766 


6 


0.007463 


0.005281 


0.002182 


7 


0.003731 


0.002739 


0.000993 


8 


0.007463 


0.001432 


0.006031 


9 


0.005597 


0.000753 


0.004844 


10 


0.001866 


0.000398 


0.001468 


11 


0.000000 


0.000211 


-0.000211 


12 


0.000000 


0.000113 


-0.000113 


13 


0.001 866 


0.000060 


0.001806 


14 


0.000000 


0.000032 


-0.000032 


15 


0.000000 


0.000017 


-0.000017 




Final 


Lower 


Upper 


Parameter 


Value Standard 95% 


95% 



\ 0.709781 0.019695 0.667540 0.752021 



0.575939 0.027410 0.517150 0.634728 



cal area is extracted from the circuit layout, it fol- 
lows that changing the layout can affect the yield. 
However, the total number of die manufactured on 
a wafer is also a function of area, and changing 
the layout can also affect the number of die that can 
fit on a wafer. The design for manufacturability 
(DFM) method quantitatively analyzes these two 
relationships and maximizes the number of good 
die per wafer. 

Figure 13 is a flowchart that shows the role the 
single layer yield model plays in process enhance- 
ment and design for manufacturability Two loops 
have been outlined in the diagram: the process 
enhancement loop and the design for manufactura- 
bility loop. 

DFM currently uses three software tools: the 
yield model, the cell counter, and the die counter. 
The yield model has been made available to design 
engineers for their use to model yield based on lay- 
out artwork. This software tool can be used in the 
design phase to evaluate the manufacturability of 
chip subblocks. The subblocks that produced the 
greatest number of good die per wafer can then be 
used in the chip design. Since M2 short circuits fre- 




Figure 12 Chip Yield Estimate with Spare 
and Bank Redundancy 

quently cause defects, the work to date has focused 
on reducing the critical area of M2 short circuits. 
The idea is extendible to other circuit features such 
as line widths and contacts. 

The cell counter routine counts the number of 
times that a specified portion of a circuit is repeated 
in a product layout. Critical area extractions can 
require large amounts of CPU time, but the cell 
counter allows the extraction to be performed on 
a small portion of the chip quickly. The result can 
then be multiplied by the result of the cell counter 
to model the yield. The cell counter can also be 
used during layout to determine the effect on yield 
of increasing or decreasing the size of the product. 
This technique is especially effective with very 
repetitive layout, for example, SRAMs. 

The third software tool included in DFM is the die 
counter. This routine counts the number of die or 
chips that can fit on a wafer given the die size. The 
die counter is used during layout along with the 
yield model to optimize the number of good chips 
per wafer. 

Conclusions 

Enhancement of integrated circuit yield at times 
requires many and diverse analytical techniques 
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for the detection and elimination of yield- limiting 
mechanisms. The techniques described in this 
paper discuss some of the analytical tools necessary 
for the successful manufacture of very large, com- 
plex CMOS digital circuits. These tools have been in 
use at Digital for approximately ten years, and the 
tool set continues to evolve with each new genera- 
tion of technology. As these techniques are refined 
and new ones are developed, the overall under- 
standing of integrated circuit yield and yield loss is 
increased. 

References 

1. D. Dobberpuhl et al., "A 200MHz 64b Dual- 
Issue CMOS Microprocessor," IEEE Interna- 
tional Solid-State Circuits Conference Digest 
of Technical Papers (February 1992): 106-107. 



2. R Gangatirkar, R. Presson, and L Rosner, 
'Test/Characterization Procedures for High 
Density Silicon RAMs," Proceedings of the 
International Solid-State Circuits Conference 
(February 1982). 

3. D. Walker, "Yield Analysis for Fault-Tolerant 
Arrays," CMU Research Report No. CMUCAD- 
88-46, October 1988. 

4. D. Walker, Yield Simulation for Integrated 
Circuits (Kluwer Academic Publishers, 1987). 

5. D. Walker, "Experience with the VLASIC 
System in Defect Probability Prediction," CMU 
Research Report No. CMUCAD-90-41, Septem- 
ber 1990. 

6. J. Cunningham, "The Use and Evaluation of 
Yield Models in Integrated Circuit Manufac- 



98 



Vol. 4 No. 2 Spring 1992 Digital Technical Journal 



A Yield Enhancement Methodology for Custom VLSI Manufacturing 



turing," IEEE Transactions on Semiconductor 
Manufacturing, vol. 3, no. 2 (May 1990). 

7. T. Okabe, M. Nagata, and S. Shimada, "Analysis 
of Yield of Integrated Circuits and a New 
Expression for the Yield," Electrical Engineer- 
ing in Japan, vol. 92 (December 1972). 

8. C. Stapper, "LSI Yield Modeling and Process 
Monitoring," IBM Journal of Research and 
Development, vol. 20 (May 1976). 

9 A. Rogers, Statistical Analysis of Spatial Dis- 
persion (London: Pion Ltd., 1974). 

10. C. Stapper, "Large Area Fault Clusters and 
Fault Tolerance in VLSI Circuits: A Review/' 
IBM Journal of Research and Development, 
vol. 33, no. 2 (March 1989): 162-173. 

11. R. Collica, "The Effect of the Number of 
Defect Mechanisms on Fault Clustering and 
its Detection Using Yield Model Parameters," 
IEEE Transaction on Semiconductor Manu- 
facturing, vol. 5, no. 3 (August 1992). 

12. J. Pineda de Gyvez and J. Jess, "Systematic 
Extraction of Critical Areas from IC Layouts," 
International Workshop on Defect and Fault 
Tolerance in VLSI Systems (October 1989). 

13. A. Ferris-Prabhu, "Modeling the Critical Area 
in Yield Forecasts," IEEE Journal of Solid-State 
Circuits, vol. SC-20, no. 4 (August 1985). 

14. C. Stapper, "The Effects of Wafer to Wafer 
Defect Density Variations on Integrated Cir- 
cuit Defect and Fault Distributions," IBM Jour- 
nal of Research and Development, vol. 29, 
no. 1 Qanuary 1985). 

15. A. Ferris-Prabhu, "Role of Defect Size Distribu- 
tion in Yield Modeling," IEEE Transactions on 
Electron Devices, vol. ED-32, no. 9 (September 
1985). 

16. S. Kikuda et al., "Optimized Redundancy 
Selection Based on Failure-Related Yield 
Model for 64-Mb DRAM and Beyond," IEEE 
Journal of Solid-State Circuits, vol. 26, no. 11 
(November 1991). 

17. C. Stapper, "On Yield, Fault Distributions, 
and Clustering of Particles," IBM Journal of 
Research and Development, vol. 30, no. 3 
(May 1986). 



18. C. Stapper, A. McLaren, and M. Dreckmann, 
"Yield Model for Productivity Optimization of 
VLSI Memory Chips with Redundancy and Par- 
tially Good Product," IBM Journal of Research 
and Development, vol. 24, no. 3 (May 1980). 

19- C. Stapper, "Small Area Fault Clusters and 
Fault Tolerance in VLSI Circuits," IBM Journal 
of Research and Development, vol. 33, no. 2 
(March 1989): 174-177 



Digital Technical Journal Vol. 4 No. 2 Spring 1992 



99 



Daniel B. Jackson 
David A, Bell 
Brian S. Doyle 
Bruce J, Fishbein 
David B. Krakauer 
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Hot carrier-induced degradation ofMOS transistors is an essential consideration in 
the development of CMOS processes. Most empirical approaches that characterize 
transistor hot carrier lifetime only provide indications of relative degradation; they 
do not make a connection between circuit operation and hot carrier degradation 
under experimental stress conditions. Digital's Advanced Semiconductor Develop- 
ment Group has devised a physically based method for ensuring that the hot 
carrier lifetime of transistors produced by a new process technology is acceptable. 
The models used incorporate degradation under three voltage bias conditions and 
allow for the effect of dominant manufacturing variations on transistor hot carrier 
lifetime. The method also takes into account the sensitivity of the circuit design 
to transistor hot carrier degradation. This hot carrier reliability assurance gives 
developers the ability to predict circuit hot carrier lifetime and thus allows them to 
maximize transistor performance. 



Hot carrier-induced degradation of metal-oxide 
semiconductor (MOS) transistors is an essential 
consideration in complementary metal-oxide semi- 
conductor (CMOS) technology development. The 
reduction of metal-oxide semiconductor field- 
effect transistor (MOSriiT) channel dimensions to 
micron and submicron sizes has placed increasing 
importance on the reliability of the gate oxide 
and its interlace with the underlying silicon. 
Achieving optimum transistor performance while 
maintaining the necessary circuit reliability is 
fundamental to manufacturing high-performance 
( :M( )S microprocessors. 

The hot carrier-induced transistor degradation 
arises from the high energy acquired by channel 
carriers, either electrons or holes, as they move 
from the MOSFKT source to drain. 1 H The elec- 
tric field through which the carriers move has 
increased with succeeding generations of process 
technology because transistor dimensions have 
scaled faster than operating voltages. The high 
energy of the channel carriers leads to a gradual 
degradation of the transistor characteristics 
through charge trapping in the gate oxide and gen- 
eration of interface states. This hot carrier-induced 
degradation can become large enough to cause cir- 
cuit failure. 



Consequently, much work has been devoted 
to understanding hot carrier degradation, with a 
view toward developing accurate prediction tech- 
niques for both transistor and circuit lifetime, such 
as those discussed in this paper. ^ 8 Substantial 
progress has also been made in increasing the hot 
carrier robustness of transistors through drain 
junction design and gate oxide optimization; a dis- 
cussion of this is beyond the scope of this paper 
however. Most of the effort to date has been 
focused on n-channel transistors, where, for cur- 
rent semiconductor process technologies, the 
degradation is more severe than in p-channel 
transistors. However, for technologies with an 
effective channel length of less than 0.5 micron 
(/Ltm), p-channel hot carrier effects will become 
increasingly important. This paper thus focuses on 
n-channel transistors. 

A number of empirical approaches have evolved 
in the semiconductor industry to characterize tran- 
sistor hot carrier lifetime. Most of these approaches 
provide indications of relative degradation, allow- 
ing the comparison of different transistor designs. 
However, these schemes do not make a connection 
between circuit operation and hot carrier degra- 
dation under experimental stress conditions. The 
lack of a method for predicting circuit lifetime can 
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lead to a conservative approach of compromising 
transistor performance in order to improve transis- 
tor hot carrier lifetime, whether this improvement 
is warranted or not. 

This paper presents a physically based method 
for determining whether the hot carrier-induced 
degradation in transistor characteristics is accept- 
able. The procedure is divided into two parts, an 
experimental measurement of transistor degrada- 
tion, and a determination of the maximum permis- 
sible transistor degradation for continued circuit 
operation. For convenience, we will refer to both 
transistor lifetime and circuit lifetime. The circuit 
lifetime is the length of time a circuit will operate 
in conformance to its stated specifications. The 
transistor lifetime is the time it takes a transistor 
under stress to reach a chosen degree of degrada- 
tion. The Circuit Considerations section shows that 
these two lifetimes are not necessarily the same. In 
practice, of course, it is the circuit lifetime that is 
important. A significant contribution of the work 
described in this paper is the ability to predict cir- 
cuit lifetime from transistor lifetime. 

The method begins with experimental measure- 
ments of transistor degradation under static stress 
conditions, which results in three fundamentally 
different types of damage to the transistor. Using 
these measurements, a model is developed that 
allows the determination of transistor lifetimes 
under dynamic bias conditions for an assumed max- 
imum acceptable degradation. Also, for a given cir- 
cuit, the method specifies the worst-case transistor 
produced as a result of manufacturing process vari- 
ations in such parameters as gate oxide thickness 
and channel length. The circuit-dependent part of 
the method relies on two important quantities: the 
set of worst-case, time-varying biases seen by tran- 
sistors in the circuit, and the maximum amount of 
transistor degradation that the circuit can tolerate 
and still remain functional. 

Physical Mechanisms — Measurement 
of Transistor Degradation 

Hot carrier degradation in a MOS transistor is usu- 
ally localized to the region where most of the volt- 
age drop between the drain and the source occurs, 
i.e., between the pinch-off point and the drain junc- 
tion, as shown in Figure 1. The size of the high field 
region near the drain junction is typically on the 
order of 0.1 fim. In this region, the charge carriers in 
the inversion layer are accelerated by the high field 



and become energetic or "hot." Since the mean free 
path of an electron in silicon is small, approxi- 
mately 60 angstroms, most electrons lose the 
excess energy they acquire by moving through 
the high field region via collisions with lattice 
phonons. However, some fraction of the electrons 
will traverse the high field region without suffer- 
ing enough collisions to lose all of the energy 
gained from the electric field. These electrons are 
the hot carriers that cause degradation in transistor 
characteristics. 



GATE 




HIGH FIELD 
REGION 



Figure 1 Schematic of a Transistor in 

Saturation Showing the Device 
Biased into Pinch-off 

Under conditions that generate hot carriers, a rel- 
atively large number of "lucky" electrons (those suf- 
fering few collisions) can gain the 1.7 electron volts 
in energy necessary for electron hole avalanche gen- 
eration. The generated holes move out through the 
substrate and can be measured as a substrate cur- 
rent ! B . Fewer of the lucky electrons will gain the 
3.1 electron volts in energy necessary to overcome 
the silicon/silicon dioxide (Si/Si0 2 ) barrier and pass 
into the gate dielectric. At sufficiently high voltages, 
the holes created in the avalanche can also have 
enough energy to be injected into the gate dielec- 
tric of the transistor. Because the avalanche-gener- 
ated holes can be measured as a substrate current, 
substrate current is frequently used as a measure of 
the driving force for hot carrier degradation. It gives 
a convenient measure of the number and energy of 
high energy carriers in the pinch-off region. 

Location of Hot Carrier Damage 

The injection of charge into the oxide is, by itself, 

not a cause for concern. Measurements using the 
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floating gate technique, for example, show that this 
current is on the order of picoamperes per fxm of 
gate width for electron injection at high drain volt- 
ages V DS . l) For hole injection, the current is several 
orders of magnitude smaller. 

However, a problem does arise from the small 
fraction of the injected charge trapped in the oxide, 
TV , and the interface states generated by the hot 
carriers, N^. The oxide and interface damage causes 
changes in the linear region transconductance g m 
and/or the threshold voltage V r , as well as in the 
saturated drain current I DSAT of the transistor. The 
I DSAI of the transistor is the drain current with the 
gate and drain voltages equal to the positive power 
supply voltage V pn and the source voltage equal to 
the negative power supply voltage V ss . 

Oxide traps and interface states that are uni- 
formly distributed along the channel can be identi- 
fied from the poststress current versus voltage 
characteristics. 10 A uniform N ss causes a change in 
the subthreshold current characteristics, and a uni- 
form /V causes a shift in the V T . However, hot car- 
rier stress damage is not uniform. It is localized 
to the region around the drain junction shown in 
Figure 1. The degradation resulting from this 
nonuniform damage is limited primarily to gate 
voltages V GS above V r , i.e., the drain current I D is 
reduced when V as is greater than V T , irrespective of 
whether N ox or A' A . A , is created. 1 1 

This interpretation is supported by two- 
dimensional simulations. 12 The similarity in effects 
of interface states and oxide traps has resulted in a 
poor understanding of the types of damage that 
occur in hot carrier stressing and, in turn, has led to 
difficulties in predicting transistor lifetimes under 
dynamic stress conditions. 

Kinetics of Damage Evolution 
Figure 2 shows typical stress results at five different 
drain voltages. The gate voltage in each case corre- 
sponds to the peak substrate current. Although 
many criteria are used to monitor stress damage, 
e -g-> v r> £///' an d linear / D , the criterion used to 
obtain the results presented in this section is the 
change in the saturated drain current when both 
V DS and V GS equal V nn , I DSAT - From a circuit stand- 
point, I DSAT has been identified as one of the most 
meaningful hot carrier monitors. Figure 2 illus- 
trates that, at all stress voltages, the degradation as a 
function of time t obeys the equation 



where the constants K and n are empirically deter- 
mined from experimental data; the value of n is usu- 
ally between 0.3 and 0.7. From curves such as those 
shown in Figure 2, it is possible to interpolate or 
extrapolate to a certain level of degradation and 
obtain the transistor lifetime at a given voltage. The 
choice of the transistor lifetime criteria, in this case 
5 percent change in I DSAT , is discussed more fully 
in the Circuit Considerations section. 
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Figure 2 Degradation as a Function of Time 
for Devices Stressed at Different 
Drain Voltages 

Three Types of Stress Damage 
In the stressing of n-channel metal-oxide semi- 
conductor transistors, there are three major regions 
of damage in the gate voltage range. 4 8 The three 
regions are distinguished by the charge injected 
into the oxide during hot carrier stress. Figure 3 
shows the gate current I G as a function of gate volt- 
age for a fixed drain voltage. The figure is divided 
into three regions: the low gate voltage region, 
region I, in which holes are the predominant com- 
ponent of the gate current; the medium gate volt- 
age region, region II, where both electrons and 
holes are injected in approximately equal numbers; 
and the high gate voltage region, region III, where 
electrons are the main current species. In region I, 
the predominant damage mechanism is the genera- 
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tion of electron traps in the bulk oxide by the 
injected holes, N ox b . In region II, the dominant dam- 
age mechanism is the generation of interface states 
N ss . In region III, the dominant damage mechanism 
is the generation of electron traps in the bulk oxide 
by the injected electrons, N . In circuit operation, 
all three types of damage occur. The relative impor- 
tance of each depends not only on the voltage of 
operation, but also on the relative rate of degra- 
dation in each region as shown in the section 
Dynamic Hot Carrier Effects. 
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Gate Voltage for a Device Biased 
at High Drain Voltages 

Medium Gate Voltages The most widely recog- 
nized damage mechanism occurs in the medium 
gate voltage region and is caused by interface state 
generation. Curve (b) in Figure 4 shows the amount 
of degradation suffered as a function of gate voltage 
for a series of devices stressed at a fixed drain volt- 
age. The amount of degradation peaks at the same 
gate voltage as the peak in substrate current, 
depicted in curve (a). A direct relationship exists 
between the degradation and the peak substrate 
current during stress. 3 The transistor lifetime under 
stress conditions that generate interface states is 
given by 



'Nss 



B * 



(2) 



where r N is the transistor lifetime for stress in 
region II of Figure 3, and I B is the substrate current. 
The constants A and m are established by fitting 
the experimental data; the value of m is usually 
about 2.9. Figure 5 shows the dependence of 



(obtained from data similar to that in Figure 2) on 
I B . Under these gate voltage conditions, transistor 
lifetime can be predicted simply by establishing the 
values of A and m in equation (2) and extrapolating 
to the known substrate current at the operating 
drain voltage. 
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low Gate Voltages The second type of damage 
results from the injection of avalanche-generated 
hot holes into the oxide. These injected holes can 
create neutral electron traps and hole traps. s - 7b 
The hole traps do not significantly affect transistor 
lifetime estimates and will not be discussed exten- 
sively in this paper. It was not evident in previous 
work that electron traps were being created. Since 
the traps were neutral, their effects on the current 
versus voltage characteristics were not observed. 
However, if the stressed transistors of Figure 4 are 
injected with electrons (brief stress with high V (js 
equal to V m ), the neutral electron traps become 
charged and contribute to the degradation, as 
shown in curve (c). The lifetime due to electron 
traps created at low gate voltage obeys 



(3) 



where r v , is the transistor lifetime for stress in 

A ft X,b 

region I of Figure 3. 7 1 B and l n are the substrate and 
drain currents during stress. The constants B and n 
are determined by fits to the experimental data; the 
value of n is usually in the range of 7 to 12. Figure 6 
shows the transistor lifetime for oxide trap damage 
created at low gate voltages as a function of the 
ratio of the substrate current to the drain current. 
Using equation (3), the transistor lifetime for elec- 
tron trap damage at low gate voltage can be found 
at any given drain voltage. 

High date Voltages The third type of stress dam- 
age in XMOS transistors occurs at high gate volt- 
ages, under conditions that inject electrons into 
the oxide (region III of Figure 3). This damage, 
caused by electron trapping, was first identified 
through the different gradient obtained when plot- 
ting degradation against time, as shown in Figure 6. 
Similar to equation (1), equation (4) for the time 
behavior is 



4 W = 



(4) 



where the constants D and k are determined by 
fitting the data; the value of k is usually between 
0.15 and 0.35. The same procedure used to obtain 
the results presented in Figure 2 was used to pre- 
dict a transistor lifetime for this type of damage. 
That is, apply stress to a number of devices at differ- 
ent drain voltages (with V itS equal to 1^), and carry 
out the extrapolation/interpolation to obtain the 
transistor lifetimes at the operating drain voltage. 
The transistor lifetime can be plotted as 



where r v 



(■s ..'In' 

is the transistor lifetime under stress in 
region II of Figure 2. l D and / r/ are the drain and gate- 
currents during stress. The constants C and / arc- 
determined by fits to the experimental data; the 
value of / is usually about 2.9- Figure 7 indicates 
that the data for these stresses is linear, where V (;s 
is equal to V^.-, allowing for the extrapolation and 
prediction of transistor lifetimes at different drain 
voltages for this type of stress damage. 
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Figure 6 Transistor Lifetime for Oxide 
Trap Damage Created at 
Low Gate Voltage 

Process Variations 

In addition to understanding the effects of different 
applied voltages on the transistor, it is important to 
consider the effect of manufacturing process vari- 
ability on hot carrier reliability. For a given source/ 
drain design, the major cause of that variability is 
variation in the transistor effective channel length 
£ cff . Of secondary importance is variation in the 
gate oxide thickness t ox . From a performance stand- 
point, it is desirable to make L v[t - and t ox as small as 
possible in order to get the highest saturated drain 
current I DSAT , and thereby the highest speed. How- 
ever, as L c(( and t ox decrease, the transistor lifetime 
rapidly decreases. 

Because the damage region remains roughly con- 
stant in size, as the channel length decreases a 
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Figure 7 Transistor Lifetime for Oxide Trap 
Damage Created for Equal Gate 
and Drain Voltage Stress 

larger fraction of the channel is damaged and the 
transistor lifetime is shorter. The direct effect of L c(( 
on transistor lifetime is approximately 

T*m%, (6) 

where n t is usually about 3. 14 There is also an 
implied transistor lifetime dependence on L tU 
through the increase in I B that results from a 
smaller L cff . Taking this into account, the net depen- 
dence of transistor lifetime on effective channel 
length is approximately 

r*//^ ff . (7) 

The effect of variations in t 0X is accounted for in 
the changes in / B , l G , and I D that result. 

One of the keys to designing the source/drain 
process is to optimize the moderately doped drain 
(iMDD) region for minimum peak substrate current 
under operating conditions and thus maximize 
the hot carrier reliability. However, once the iMDD 
implant and diffusion process is fixed, the variation 
in transistor hot carrier lifetime for a given L c(( is 
relatively small. Typically, we find that for a stable 
process, transistor lifetime variation is less than a 
factor of two from one integrated circuit manufac- 
turing lot to another. 

Dynamic Hot Carrier Effects 
An accurate transistor lifetime prediction model 
for hot carrier reliability must consider the actual 
stress conditions to which iMOSFETs are subjected 



under normal circuit operation. Thus, although 
accurate transistor hot carrier lifetime models exist 
for static bias conditions, dynamic bias condi- 
tions must also be taken into account. 1516 Initial 
attempts to predict transistor dynamic stress life- 
times were based on quasi-static sums of transistor 
static stress lifetimes. 17 18 However, these initial pre- 
dictions were much longer than transistor lifetime 
measured under dynamic stress, because they only 
considered the contribution from interface state 
generation and omitted the effects of bulk oxide 
trapping. 19 20 While the enhanced dynamic stress 
degradation has generated much debate and numer- 
ous explanations, the effect can be explained by 
considering a quasi-static sum of the three damage 
mechanisms detailed in the previous section. 19 21 

An Accurate Dynamic Hot Carrier Lifetime Model 
Our model, based on transistor static stress life- 
times, differs from previous static-based models in 
that it takes into account all three types of damage. 
During dynamic stress, with the drain voltage con- 
stant at some high voltage, a dynamic gate voltage 
subjects the MOSFET to the three types of damage 
shown in Figure 8. (The stress waveform shown in 
Figure 8 is for illustrative purposes only and does 
not necessarily reflect an actual circuit waveform.) 

If the instantaneous values of I D , I B , and l c are 
known for the dynamic stress, we can calculate 
the contributions of the three damage mechanisms 
by integrating equations (2), (3), and (5) over the 
time period T of the dynamic stress waveform. The 
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Dynamic Stress Waveform and 
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resulting expressions for the lifetimes due to the 
three physical mechanisms are given by 



r v TJo 8 

1 B rr 
yj {) 08 / fa>"' I D' c i t > ana 

1 C rr 



(8) 



(9) 



(10) 



Treating 1 /t as a damage function, the total 
dynamic stress damage can be modeled as 



(ii) 



Example of the Dynamic Hot Carrier Lifetime 
Model To demonstrate the application of equa- 
tion (11), consider the dynamic stress waveform 
in Figure 8 applied to n-channel MOSFIiTs. V as was 
pulsed between zero volts and V I)S with 2-micro- 
seconcl rise/fall limes, a 20-microsecond period, 
and 10 and 50 percent duty cycles. Stresses were 
performed for different values of V os , which was 
held constant during each stress. 

Figure 9 shows the dynamic stress results, as a 
function of V„ s , compared to the transistor dynamic 
stress lifetimes predicted by a quasi-static interpre- 
tation of equation (2) and data similar to that shown 
in Figure 5. The measured transistor dynamic life- 
time is noticeably shorter than that predicted by the 
quasi-static sum. The transistor dynamic stress life- 
times can be as much as two orders of magnitude 
lower than r v These results demonstrate that con- 
sideration of only one damage mechanism, in this 
case the interface states created by medium gate 
voltage stress, is insufficient for predicting transis- 
tor dynamic stress lifetimes. The contributions of 
/V n . h and N (n . ( , must also be included, as indicated 
by equation (11). 

The quasi-static contributions of the three types 
of damage, N ss> N nxb > and N oy e , are shown in Figures 
10 and 11, with 10 and 50 percent duty cycles, 
respectively. The curves shown were calculated 
from the static stress results of the previous section 
and equations (8)-(10), except for «V ovt .. The cal- 
culation of the quasi-static contribution of N oxc 
requires knowledge of the instantaneous gate cur- 
rent which, in this case, requires time-consuming 
measurement techniques. Fortunately, the dynamic 
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Figure 9 Transisto r Dynamic Stress Lifetime 
as a Function of Drain Voltage 

stress waveform chosen allows us to make simplify- 
ing approximations in the calculations of r ;V . 
Since V DS is constant during the dynamic stress, 



/V ( ,.v.i-(sintic) 

d 



(12) 



where t v v Cstatjc) is the transistor static stress life- 
time when V (rS equals V ns , and d is the effective duty 
cycle in percent. In our example, we assume that 
the time during which V GS equals V DS is large com- 
pared to the time spent during V cs transition in the 
high V GS region. Additionally, f c drops off quickly 
with reduced V GS . Thus, d may be approximated by 
the fraction of the time period that V cs equals V /)S . 
As measured on an oscilloscope, the effective duty 
cycles were 7 and 45 percent for settings of 10 and 
50 percent, respectively. 

The dynamic stress model of equation (11) 
was applied with the data from Figures 10 and 11 
to yield the transistor dynamic stress lifetime 
curves in Figures 12 and 13. Figures 12 and 13 also 
include the measured transistor dynamic stress life- 
time curves for both 10 and 50 percent duty cycles 
and reveal an excellent match between the pre- 
dicted and measured transistor lifetimes. These 
figures show that the inclusion of yv ffl N oxe , and 
N oxb f u ^ v accounts for the enhanced dynamic 
stress degradation. 
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Figure 10 Calculated Contributions from 

the Three Types of Dynamic Stress 
Damage ( 10 Percent Duty Cycle ) 
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Figure 12 Measured and Calculated 
Transistor Dynamic Stress 
Lifetimes (10 Percent Ditty Cycle ) 
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Figure 11 Calculated Contributions from 

the Three Types of Dynamic Stress 
Damage (50 Percent Duty Cycle) 

Discussion of the Dynamic Lifetime Model Fur- 
ther examination of the contributions of each of the 
three types of damage illustrates their effect on 
transistor dynamic lifetimes relative to stress wave- 
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Figure 13 Measured and Calculated 
Transistor Dynamic Stress 
Lifetimes (50 Percent Duty Cycle) 

forms. Figures 10 and 11 indicate that/V^ has the 
lowest transistor lifetime at high V DS . Thus, the tran- 
sistor dynamic lifetime is sensitive to the duration 
of stress in the region where V GS equals V DS \ the con- 
tribution of N decreases with the duty cycle. 
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In the case of a 10 percent duty cycle, the contri- 
bution of \ nvc is reduced to the contribution level 
of N (jvfr As shown in Figures 12 and 13, the mea- 
sured and calculated transistor dynamic stress life- 
times are 3 to 10 times greater for the 10 percent 
duty cycle than for the 50 percent duty cycle. 

The results presented in this section are for one 
CMOS technology. The relative importance of the 
three types of damage can be different in other 
CMOS technologies. Thus, different technologies 
can have different dependencies on the gate and 
drain voltage waveforms. These differences can be 
analyzed through the use of a SPICli postprocessor, 
which calculates the damage integrals of equations 
(8)-(10) from the gate and drain voltage waveforms. 

Circuit Considerations 

The transistor lifetime integrals of equations 
(8)-(10), along with equation (11) for the combined 
transistor lifetime, contain the essential transistor 
physics. However, a number of circuit-related con- 
siderations are important in assessing the hot car- 
rier reliability of a process technology. These 
include the actual waveforms experienced by the 
transistor, the implications of speed binning, and 
the amount of degradation that will cause a circuit 
to fail. These topics are discussed in this section. 

In-circuit Waveforms 

The technique described in the previous section for 
determining hot carrier transistor lifetime under 
dynamic bias conditions can be applied to tran- 
sistors in integrated circuits, if the gate and drain 
voltage waveforms are accurately known. The wave- 
form shapes depend on the circuit type of which 
the device is a part, and also on the magnitude of 
switching transients and power supply noise, 
which can elevate the drain voltage above Y pD , the 
on-chip positive power supply voltage. The follow- 
ing discussion of the factors contributing to inte- 
grated circuit waveform amplitude and timing uses 
CMOS microprocessor design as an example. 

Maximum Node Voltage Switching transients 
and power supply ringing on an integrated circuit 
can raise internal node voltages well above the 
nominal positive power supply voltage V Dn . Table 1 
categorizes these effects for a microprocessor chip 
with a nominal power supply voltage of 3 3 volts, 
and shows that voltages as high as 4.3 volts are 
expected. In addition to the 5 percent power 
supply tolerance based on ripple and drift consider- 



ations, the on-chip positive and negative power 
supplies, V DD and V ss , experience ringing due to 
inductance in the package. The ringing worsens 
with increases in the package inductance and the 
rate at which current is drawn into the chip {dl/clt). 
With advances in technology, increases in clock fre- 
quency will result in more severe ringing unless 
accompanied by a reduction in the package induc- 
tance or an increase in the amount of on-chip 
decoupling capacitance. The combined effects of 
power supply ripple and inductive ringing can be 
modeled by a sine wave superimposed on the static 
supply with the appropriate amplitude and a fre- 
quency several times greater than the clock rate 
(depending on the number of clock phases). 



Table 1 Contributions to the Maximum 
Node Voltage 



Contribution 


Increment 
(volts) 


Subtotal 
(volts) 


Nominal Power Supply 


3.30 


3.30 


5 Percent Power Supply 
Tolerance 


0.165 


3.465 


On-chip Ringing 


0.175 


3.64 


20 Percent Capacitive 
Coupling 


0.66 


4.30 



Although power supply tolerance and power 
supply ringing cause increases in the on-chip 
power supply voltage to which all nodes are 
exposed, some internal nodes can be booted above 
V DD by capacitive coupling of voltage transitions on 
nearby nodes. The magnitude of the voltage rise 
above V DD depends on the ratio of coupling capaci- 
tance to total node capacitance, and on the mag- 
nitude of the nearby transition. Two factors limit 
the maximum voltage: clamping by p + /n diodes in 
p-channel pull-up transistors, when the drain volt- 
age exceeds v pD by a forward diode drop; and turn- 
ing on of p-channel pull-up transistors, when the 
node voltage exceeds v OD by a threshold voltage 
drop. Simulations of capacitive coupling events 
in CMOS microprocessors have shown that n-well 
resistance severely limits the diode clamping abil- 
ity and that p-channel conduction is typically 
weak. Furthermore, some circuit types, e.g., virtual 
ground circuits, contain nodes without p-channel 
pull-up transistors. 

The voltage rise above V DD on internal nodes is 
often limited primarily by the amount of capacitive 
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coupling. Therefore, it is important to know the 
maximum possible coupling. Examination of the 
circuits in the CMOS microprocessor example 
shows that not all nodes experience capacitive 
coupling, and only a small fraction of the nodes 
are expected to reach 4.3 volts during the part of 
their operating cycle during which hot carrier dam- 
age occurs. In some cases, capacitive coupling is 
restricted by noise margin considerations. For the 
microprocessor example, the ratio of coupling 
capacitance to total nodal capacitance is limited 
to a maximum of about 20 percent. A noise margin 
limit to coupling capacitance provides an upper 
limit for the node voltage. In general, though, it is 
necessary to independently determine the extent 
of coupling for each node in the circuit that may be 
affected by hot carrier degradation. 

Waveform Timing The relative timing of gate 
and drain voltage waveforms and the extent of 
capacitive coupling are both circuit dependent. 
Therefore, it is necessary to find the worst-case 
combination of these factors in order to determine 
the circuit hot carrier 1 ifetime for a particular chip. 
Analysis of four circuit types used in CMOS micro- 
processors (complementary drivers, pass transistors 
connected to storage nodes, precharge circuits, and 
virtual ground circuits) shows that the worst-case 
hot carrier conditions typically occur in precharge 
circuits and in tri-state driver circuits. Figure 14(a) 
is an example of a precharge circuit with capacitive 
coupling between the output (node A) and a nearby 
node at voltage V c that switches from v ss to V DD . The 
worst-case waveform for this circuit is shown in 
Figure 14(b). This waveform was constructed using 
the circuit simulator SPICE, assuming worst-case 
capacitive coupling for that circuit type (20 percent) 
and phasing between the power supply noise and 
the coupling event to produce the maximum drain 
voltage. The factors listed in Table 1 increase the 
n-channel drain voltage during the low and medium 
gate voltage part of the waveform. This is the region 
in which hole-generated electron traps N ox b and 
interface states are created in the gate oxide. 

Figure 15 shows a waveform for a pass transistor 
connected to a storage node (capacitive coupling 
and ringing effects not included) that differs from 
the precharge circuit described in the previous 
paragraph in the timing between the gate and drain 
signals. The longer duration of simultaneously high 
gate and drain voltages in this circuit make elec- 
tron-generated trap damage N ox potentially worse 
than for the precharge circuit. 



Vdd 
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4 v ss 

(a) Circuit Diagram 
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(b) Waveform 



Figure 14 Circuit and Waveform of a Precharge 
Circuit with Capacitive Coupling 

It is possible to establish broad circuit categories 
that are particularly susceptible to hot carrier dam- 
age. However, the complex dependence of wave- 
form shape on circuit layout and timing requires 
that many nodes be examined individually to deter- 
mine if they present hot carrier problems. An auto- 
mated software tool such as a SPICE postprocessor 
can be very useful for this purpose, given that all 
aspects of the waveforms are modeled accurately. 

Li fetime Criteria 

The previous discussion applies to any transistor 
lifetime criterion used. To determine if a process 
technology produces transistors with acceptable 
hot carrier degradation, it is necessary to establish 
reasonable criteria from a circuit perspective. 



Digital Technical Jou rnal Vol, 4 No. 2 Spring 1992 



109 



Semiconductor Technologies 




(a) Circuit Diagram 



Figure 15 Circuit and Waveform of a Pass 
Gate to a Storage Node Showing 
the Timing of the Gate and Drain 
Voltage Signals during a Read 
Operation 

These criteria depend on the circuit design and the 
stated performance. For typical microprocessor cir- 
cuits, the performance is stated in terms of speed. 
Thus for microprocessors, the most important con- 
sideration is usually how much transistor degra- 
dation will cause the speed of a part to fall below 
the minimum of the speed bin into which it would 
initially be placed. A secondary consideration is a 
degradation of parameters that results in loss of 
functionality independent of speed. 

Circuit Speed From the standpoint of hot carrier 
degradation, the single most important parameter 
affecting circuit speed is transistor saturated drain 
current. Therefore, the key hot carrier limit is that 
the degradation in transistor saturated drain cur- 
rent not exceed some value, usually stated as a per- 
cent of the unstressed saturated drain current. 

As shown in this section, the limiting case is the 
slowest part in the fastest speed bin. If the fastest 
speed bin has a tested speed of 0.5 nanosecond 
faster than the stated performance, then the frac- 
tional degradation of the slowest transistor falling 
into the bin must be small enough that the overall 
circuit speed drops by less then 0.5 nanosecond. 
For a typical 12-nanosecond complex instruction 
set computer (CISC) microprocessor, this would be 
a few percent change in saturated drain current. 

Hot carrier degradation of transistors results in a 
change in reverse saturated drain current that is dif- 
ferent from the change in forward saturated drain 
current. Reverse saturated drain current is the tran- 
sistor saturated drain current measured with the 
source and drain interchanged relative to the stress 
configuration. Thus, an additional criterion is nec- 
essary for reverse saturated drain current degrada- 
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tion. The sensitivity to reverse saturated drain cur- 
rent degradation is seen only in cases where the 
transistor operates in both forward and reverse 
directions, e.g., in pass transistors. Although gener- 
ally not used in speed-sensitive situations, these 
transistors are sized to provide extra margin, if they 
are so used. Thus, the allowable percent degrada- 
tion in reverse saturated drain current is typically 
several times that permitted for forward saturated 
drain current. 

Circuit Functionality In addition to parameters 
that affect circuit speed, hot carrier stress also 
causes degradation in parameters that can affect 
circuit functionality, independent of speed. Most 
important among these are threshold voltage and 
transistor off-state current. 

Transistor threshold voltage shifts can alter 
inverter noise margins and indirectly affect off-state 
current. Transistor threshold voltages are also 
important in cases in which matched performance 
of a pair of devices is required, e.g., in a memory 
sense amplifier. Because the threshold voltage of 
n-channel transistors usually increases with time 
(perhaps after an initial decrease of a few milli- 
volts), off-state current does not increase. For a 
3.3-volt nominal power supply and transistors 
with ±0.5 -volt threshold voltages, a threshold shift 
of a few tens of millivolts is acceptable. (An 
increase in threshold voltage also affects transistor 
saturated drain current, but this is taken into 
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account by a separate limit on transistor saturated 
drain current.) 

The degradation in p-channel devices is usually 
not important for technologies with channel 
lengths larger than 0.5 ^m. However, very short 
channel length p-channel devices usually suffer a 
decrease in threshold voltage, which leads to 
increased off-state current and reduced noise mar- 
gins. These effects will be of increasing importance 
in sub-0.5-Atm technologies. 

Table 2 summarizes the important transistor 
degradation parameters for circuit lifetime of a 
high-performance microprocessor. 



more rapidly, relative to the initial I DSAr , than a tran- 
sistor that began with a lower I DSAT . However, even 
after degradation, the curves never cross, i.e., a 
transistor that initially had a higher 1 DSAT continues 
to have a higher I DSAT - If these two parts are put into 
the same speed bin, the faster part will take longer 
to fail, because it will take longer for I DSAT to drop 
below the critical value for that speed bin. Thus, 
very fast chips will be reliable as long as they are not 
put into a very fast bin. The circuit hot carrier life- 
time of a particular speed bin is limited by the cir- 
cuit lifetime of the most marginal, i.e., the slowest, 
chip in that speed bin. 



Table 2 Typical Permitted Degradation 
in Device Parameters for High- 
performance Microprocessor 
Circuits 



Parameter 



Transistor Lifetime Drifting 



Forward / 



DSAT 



DSAT 



Reverse l { 
Threshold 
Off-state Current 



3-10% Shift 
10-25% Shift 
10-100 mV Shift 
Absolute Limit 



Note that the off-state current absolute limit may be difficult 
to measure, particularly at operating temperature. This limit 
can be incorporated into a limit in the decrease in magnitude 
of p-channel threshold voltage. Off-current degradation is 
usually not important for n-channel devices. 



Speed Binning Implications 
It is customary to bin the chips into one or more dis- 
crete speed categories or bins. The circuit lifetime 
of a given speed bin is equal to the time it would 
take for a worst-case device in that bin to fail. 
Because slower bins have a lower I DSAn i.e., a longer 
effective channel length, and therefore have rela- 
tively long transistor lifetimes, the lifetime issue is 
more of a constraint for fast bins. For the fast bins, 
the choice of where in the I nSAT distribution the hot 
carrier reliability should be assessed is thus a major 
concern. This section discusses how to determine 
the worst-case transistor for a given speed bin, the 
relationship between this transistor and the circuit 
hot carrier lifetime of the bin, and the selection of 

11 DSAT' 

Although it might appear that the circuit lifetime 
is determined by the fastest chip, which degrades 
more rapidly, in fact, the slowest chip in a bin will 
be the first to fail. Figure 16 shows the expected 



I DSAT as a function of time for several starting / 



FAST TRANSISTOR 




DSAT 



values. A transistor with a higher I DSAT degrades 



4 6 
TIME (YEARS) 

Figure 16 Predicted Saturated Drain Current 
as a Function of Time 

One approach to verifying the reliability of a fast 
speed bin is to compare the distribution of the I DSAT 
values for chips in that bin with that of chips in 
the next slower bin. This distribution should be 
compared with 1 DMAX , which is the highest I DSAT 
value that just meets the reliability requirements 
for a discrete transistor. If chips with an ! DSAr equal 
to I DMAX normally fall into the fastest speed bin, 
then that speed bin will be reliable even if it also 
contains parts with an 1 DSAT greater than I DiX{AX . 
Although the faster parts degrade more quickly, 
because they have more margin to degrade, they do 
not limit the reliability of the speed bin. Thus, chip 
and transistor lifetimes may not be the same; if the 
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chip has extra margin in the speed bin, the chip hot 
carrier lifetime will be longer than the transistor 
lifetime. 

The above argument applies provided there are 
no systematic changes in the parasitic capacitances. 
However, such changes, if they occur, do not pre- 
sent a serious problem because the key thicknesses 
and widths that determine the capacitances are nor- 
mally monitored in production. As a result, circuit 
reliability is limited by the slowest device with the 
lowest I DSAT in the fastest speed bin. Therefore, 
transistors used for hot carrier evaluation should 
have an l DSAT corresponding to the slowest chip in 
the fastest bin, and the lifetime criteria are deter- 
mined by that chip. 

Conclusions 

This paper presents a physical model for hot carrier 
degradation incorporating three damage modes: 

■ Medium Gate Voltages, /V, s 

■ Low Gate Voltages, N 9x h 

■ High Gate Voltages, A ! oxe 

A quasi-static sum of the contributions of each 
of these damage mechanisms accurately predicts 
the transistor hot carrier lifetime for any speci- 
fied waveform and technology. In addition, models 
were developed that account for the effect on tran- 
sistor hot carrier lifetimes of the dominant man- 
ufacturing variations. These models show that 
circuit waveforms, including powersupply ringing, 
can have a substantial effect on circuit hot carrier 
lifetime. The limiting case from the standpoint of 
speed binning is the slowest transistor falling into 
the fastest speed bin. The paper gives an example 
of circuit lifetime criteria for a high-performance 
microprocessor. From these conclusions, it is possi- 
ble to outline the following procedure for transis- 
tor hot carrier reliability assurance for a CMOS 
technology used to fabricate a particular chip: 

1 . For the chip to be fabricated, determine the tran- 
sistor lifetime criteria. The microprocessor is 
only one example; other circuits may have quite 
different lifetime criteria. In practice, it may be 
better to specify the transistor degradation prior 
to circuit design and incorporate the lifetime cri- 
teria into the design process; i.e., as a part of the 
design process, make sure the design works, 
given the specified degradation. 



2. Characterize transistor degradation under the 
three gate voltage ranges indicated in Figure 3 
to determine the coefficients for the damage 
integrals, equations (8)-(10), for each of the pre- 
viously determined transistor lifetime criteria. 
The results must then be scaled for transistors 
with I DSAT , or Z cff and t ox determined in step 3. 
Alternatively, if step 3 has already been per- 
formed, use transistors that match either the 
desired I DSAT , or the Z cff and t ox . 

3. Determine either the I DSAT , or the Z cff and t ox , 
corresponding to the slowest transistor in the 
fastest speed bin. This can be done by simu- 
lation, or by processing a characterization lot 
with transistor parameters deliberately varied 
in order to determine the speed impact. 

4. Choose several representative parts of the 
chip circuitry to determine worst-case, time- 
dependent transistor biases. For each part, using 
the coefficients for the damage integrals deter- 
mined in step 2, calculate the transistor life- 
times. There will be several, one for each 
criterion. The shortest transistor lifetime is the 
limiting case. 

The two key elements in this procedure are the 
three damage integrals with the combined transis- 
tor dynamic stress lifetime and the close connec- 
tion with the circuit design process. In practice, 
several iterations of the procedure will probably 
take place as the transistor design and production 
process are optimized for maximum circuit perfor- 
mance and yield, consistent with the necessary reli- 
ability goals. For the best results, this needs to be 
done concurrently with circuit design to be sure 
the appropriate criteria are optimized. 
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Electromigration Reliability 
of VLSI Lnterconnect 

Increased speed, reduced line widths, larger chip size, and additional levels of inter- 
connect are all factors that contribute significantly to the improved performance 
and functionality of VLSI circuits. At the same time, these factors place growing 
demands on interconnect reliability. Therefore, careful characterization of the inter- 
connect reliability is important in achieving VISI performance and reliability goals. 
A scaling model was developed and used to examine factors essential to assuring 
electromigration reliability in Digitals CMOS- 4 technology and in the Alpha 21064 
microprocessor, which uses this technology. 



Background 

For complex, very large-scale integration (VLSI) cir- 
cuits, the individual components that make up the 
circuit must be extremely reliable to assure accept- 
able overall reliability Since a chip is expected to 
operate for many years, testing to characterize the 
reliability of circuit components such as inter- 
connects must be performed under accelerated test 
conditions. To evaluate circuit reliability by extrap- 
olating from data on components tested at acceler- 
ated rates, it is essential to have dependable models. 

Electromigration is one of the primary failure 
mechanisms in the polycrystalline aluminum-alloy 
thin films that are widely used for interconnects on 
VLSI circuits. 1 Although much has been written on 
the effect of stress conditions on interconnect fail- 
ure due to electromigration, little information is 
available on the use of lifetime measurements on 
simple test structures to evaluate the complex com- 
bination of interconnects on a VLSI chip. 

In this paper, we describe work performed to 
characterize the reliability of the interconnect 
in chips manufactured in Digital's CMOS-4 technol- 
ogy. Wc place particular emphasis on a model for 
scaling test structure data to chip level. Before 
presenting the model, we discuss the physics of 
electromigration and electromigration testing, as 
background information for the reader. 

Electromigration 

Electromigration is the mass transport of metal 
atoms from collisions with the current conduction 
electrons. The momentum exchange resulting from 



these collisions creates a net flux of metal atoms in 
the direction of the electron How and thus biases 
the normal random atomic diffusion. At sites of 
atomic flux divergence, where the number of metal 
atoms coming in does not equal the number going 
out, either material depletion or material accumula- 
tion occurs. Corresponding voids or hillocks form 
in the metal line and cause open or short circuits 
and, ultimately, circuit failure. 

For typical circuit operating temperatures, the 
diffusivity of metal atoms is much higher in the 
grain boundaries than through the lattice of the 
grain itself (Agrain is an individual crystal in a poly- 
crystalline film.) Hence, the atomic mass transport 
occurs primarily along grain boundaries. There- 
fore, microstructural inhomogeneities, such as vari- 
ations in grain size, can cause a flux divergence and 
thus be sites of potential failure. 

Enhanced mass transport along grain boundaries 
is one reason that electromigration is a more impor- 
tant failure mechanism for thin films than for bulk 
conductors. In thin films, the film thickness dictates 
that the grain size be much smaller than the grain 
size for bulk material. Thus, a greater proportion of 
the thin-film cross section can be composed of 
high-diffusivity grain boundaries. 

A second reason electromigration is a concern 
for thin films is related to current density. The thin- 
film interconnect on integrated circuits is in inti- 
mate thermal contact with the underlying silicon 
substrate, which acts as a big heat sink. There- 
fore, thin films can withstand higher current den- 
sities than bulk wires, without incurring thermal 
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damage. Maximum operating current densities in 
VLSI circuits are as much as 100 to 1,000 times 
higher than normal for bulk wires. With reduced 
line widths and increased speed of operation, the 
trend is toward increased current densities in 
interconnects. This combination of high current 
densities and high-diffusivity paths along grain 
boundaries promotes electromigration in poly- 
crystalline thin-film interconnects. 

The addition of small amounts of alloying ele- 
ments can substantially improve the electro- 
migration performance of thin-film aluminum 
(Al) conductors. In particular, alloying with copper 
(Cu) has proven to be a popular, cost-effective way 
to improve lifetimes. 

Electromigration Testing 
Testing to evaluate electromigration reliability is 
accelerated using stress conditions for temperature 
and current that are higher than the expected oper- 
ating conditions. Thus, an accurate physical model 
is essential for extrapolating test results to actual 
operating conditions. 

Due to random microstructural variations, nomi- 
nally identical interconnect test structures will not 
all fail at the same time. Typically, several identical 
samples are stressed together, resulting in a distri- 
bution of failure times. Knowledge of the correct 
failure distribution is critical for predicting electro- 
migration reliability. 

Electromigration reliability testing is usually per- 
formed by stressing packaged test lines in an oven 
with a constant current, while monitoring the 
voltage in situ. Depending on the metallization, 
failure may result from open circuits or an increase 
in resistance caused by voiding, or from the for- 
mation of extrusions that short-circuit to adjacent 
lines. General guidelines for electromigration test 
structure design and lifetime measurements are 
available. 2 - 3 

Care must be taken in the selection of the stress 
current and temperature, so that the failure mecha- 
nism that operates under stress conditions is the 
same one that works at expected operating condi- 
tions. The temperature must be kept low enough to 
remain in the range that grain boundary diffusion 
dominates. At very high temperatures, lattice diffu- 
sion becomes significant. 

In addition, the current must be limited to pre- 
vent excessive Joule heating. Joule heating creates 
temperature gradients along the metal line. Since 
the diffusivity of the metal atoms has an Arrhenius 



temperature dependence, a temperature gradient 
itself can cause a flux divergence and lead to failure. 

Stress Acceleration Model 
A model proposed by Shatzkes and Lloyd relates the 
time-to-failure /y to the temperature T and the cur- 
rent density j by 

t f = A (T/j) 2 exp (H/kT) (1) 

where A is a constant that depends on material 
properties, k is Boltzmanns constant, and H is 
the activation energy. 4 This formulation differs 
slightly from an earlier, largely empirical model 
proposed by Black, which did not include the pre- 
exponential T 2 term. 5 

The activation energy measured for aluminum 
and aluminum-alloy thin films is typically in the 
range of 0.5 to 0.8 electron volt (eV). This is signifi- 
cantly less than the activation energy for lattice dif- 
fusion measured in single-crystal aluminum, which 
is approximately 1.4 eV. These measurements indi- 
cate that the mass transport takes place along grain 
boundaries. 

Length/ Width Scaling Model 

To characterize the electromigration reliability of 
the interconnect on VLSI chips, we developed a scal- 
ing model that considers line length, line width, 
and tungsten-filled vias. 

The Lognormal Failure Distribution 
Electromigration failure times are generally repre- 
sented as a lognormal distribution, whereby a plot 
of the logarithms of the failure times relative to the 
cumulative percentage order on a normal probabil- 
ity scale approximates a straight line, as illustrated 
in Figure 1. The lognormal distribution is character- 
ized by two parameters: the median time-to-failure, 
or logarithmic mean, 6, and the slope, or logarith- 
mic standard deviation, y. The lognormal cumula- 
tive density function F(t) is 

F(t) =4>a\nt-d)/y) (2) 

where <P(x) is the standard normal cumulative dis- 
tribution function. 

If p represents the cumulative fraction failed, 
and t p is the corresponding time, then 

\nt p =6 + y(p- l (p) (3) 

where <t>~ x (x) is the inverse of the standard normal 
cumulative distribution function. For p approxi- 
mately equal to 0.16 or 0.84, <P~ l (p) equals -1 and 
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Figure 1 Lognormal Failure Distribution 

+ 1, respectively; for p equal to 0.5, <P~ l (p) equals 0. 
Therefore, 

\nt /6 = 6-y ln^= 0+7 ln ^ 7 , = 0 (4) 
These three points in time are indicated in Figure 1. 

Length Scaling Effect 

A major conceptual problem with the lognormal 
distribution, however, is that it does not scale with 
line length. That is, if the failure times for a given 
length line are lognormally distributed, the failure 
times of shorter or longer lines cannot also be log- 
normally distributed. This problem has significant 
repercussions in extrapolating data on test devices 
to predict the reliability of VI. SI circuits, particu- 
larly for early failures at and below the 1 percent 
level. 

Consider an ensemble of nominally identical 
lines of length / that fail over time with a distribu- 
tion described by F(t). We could, in principle, com- 
bine these lines to form new lines of length I — N t , 
where N is the number of original units that make 
up the new line. The failure of each new line would 
be determined by the earliest failure time of the 
component elements, assuming that the unit ele- 
ments behave independently. The failure distribu- 
tion of the new ensemble is given by 



If F(t) is lognormal, then G(t) cannot be, and vice 
versa. 

For the particular case where F(t) is lognormal, 
G(t) has become known as the multilognormal or 
multiple lognormal (MLN) distribution, and is given 
by 

G(t) = \-(\~<P((\nt- 0)/y)) w (6) 

Figure 2 contains a plot of the the MLN distribu- 
tion for N = 1 and N = 100. Also shown in this figure 
is a lognormal curve fitted to the t i6 and t 84 points 
of the MLN curve at N = 100. The logarithmic stan- 
dard deviation of this fitted curve, a, is a function of 
7 and yV. 
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Figure 2 M ul tilognormal Failure Distribution 

There are some important characteristics to 
glean from this figure. For small sample sizes, that 
is, over the range of cumulative percent failures 
between 0.05 and 0.95, the MLN distribution is 
nearly indistinguishable from the lognormal. How- 
ever, the difference between the two distributions 
is more significant for early failures, at and below 
the 1 percent level. These early failures are of 
concern for reliabiJity. Also note in Figure 2 that 
as N increases, the median time-to-failure t 50 
decreases, and the a of the lognormal fitted curve 
(5) also decreases. This behavior of the model agrees 
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with experimental observations on the effect of 
line length on t 5Q and a. 6 

Moreover, recent work modeling electromigra- 
tion failure in fine lines yields a failure distribution 
that is well approximated by the MLN distribution. 7 
The agreement of the modeling results with experi- 
mental findings strongly supports the use of the 
MLN distribution for length scaling. 

The scaling model as described by the MLN distri- 
bution in equation (6) has three adjustable parame- 
ters: 6 and y of the failure distribution for the 
elemental failure unit, and A', which is the total line 
length L divided by the characteristic elemental 
length /. Since a line-width dependence of the 
length effect has been demonstrated, some or all of 
these parameters must be a function of line width, 
as well. 6 

Width Scaling Effect 

For line widths somewhat greater than the average 
grain size, we can expect a continuous "network" 
of grain boundaries along the length of the line. 
Since the microstructural defects leading to electro- 
migration failure are associated with the grain 
boundaries, when the line width is much larger than 
the grain size, we might expect multiple defects to 
align at a failure site. The lifetime of the line, there- 
fore, is determined by the least severe defect exist- 
ing along the width. Thus, as the width increases, 
the probability of aligning with less severe defects 
rises, and an increase in lifetime is expected. 

However, the expectation that the lifetime will 
decrease with line width does not hold for very nar- 
row lines when a "bamboo" microstructure devel- 
ops, in which a single grain boundary traverses the 
width of the line. As the line width decreases, the 
likelihood of having a single grain span the entire 
width of the line increases. When the line width 
is comparable to the average grain size, the line 
consists of bamboo and nonbamboo segments, as 
shown in Figure 3. As the line width decreases 
further, the fraction of the line that is bamboo 
increases, that is, the length and number of non- 
bamboo segments decreases. Since electromigra- 
tion proceeds primarily along grain boundaries, 
failure correlates with the length and number of 
nonbamboo segments. Hence, as the line width 
decreases below the average grain size, the electro- 
migration lifetime improves. 

The dependence of electromigration lifetime on 
line width has been well established. 891 * In the 
bamboo region, the lifetime increases very rapidly 



as the line width decreases. For line widths some- 
what greater than the average grain size, the life- 
time gradually increases with line width. There is, 
therefore, a minimum in lifetime in relation to the 
line width when the line width is comparable to the 
average grain size. 

Grain size in polycrystalline thin films is influ- 
enced by a number of factors including the sub- 
strate, the method by which the film is deposited, 
the deposition conditions, and the grain growth 
due to postdeposition annealing. In addition, grain 
growth and grain boundary movement during post- 
patterning annealing can be even more significant 
in fine lines. 11 

In terms of the scaling model, the expected 
improvement in lifetime with reduced line width is 
a result of two effects on the model parameters. 
Research suggests that the susceptibility of a 
grain cluster to failure decreases as the length of 
the cluster decreases. 912 ^ If the time-to-failure of 
the unit element is related to the length of the non- 
bamboo grain clusters, 6 increases as the line width 
decreases. 

More importantly, as the line width decreases, 
bamboo segments constitute a larger percentage 
of the line. Thus, the number of microstructural 
defects per unit length decreases; that is, the char- 
acteristic unit length / increases. In our model, for a 
line of a given length Z, as / increases, the number of 
elemental failure units N decreases. Consequently, 
t 50 and a increase, provided y does not change 
significantly. The increase in t 50 and a with decreas- 
ing line width given by the model is in complete 
agreement with the results obtained from electro- 
migration life tests. Because the slope of the failure 
distribution is increasing together with t 50 as the 
line width decreases, it is generally recognized that 
improvement in the early failure times at and below 
the 1 percent level is much less substantial than the 
increase in t 50 . ]0 

The chip interconnect reliability is determined 
by the time-to-first-failure of its component lines of 
various lengths and widths. Therefore, the probabil- 
ity of chip failure can be written as 

H(t) = 1 -nn -F n (t)) L n /l » (7) 

where L n is the total length of the line of a particu- 
lar width, and l n and F n (t) are the characteristic unit 
length and unit failure distribution for that line 
width. If the total interconnect length on the chip is 
distributed fairly evenly between narrow and wide 
lines, the chip reliability will be dominated not by 



Digital Technical Journal Vol. 4 No. 2 Spring 1992 



117 



Semiconductor Technologies 





(b) 



Figure 3 Comparison of Bamboo and Nonbamboo Segments: 

(a) A Bamboo Microstructure in a 1.25 -micron-wide Line and 

(b) A Nonbamboo Grain Cluster Network Found in a 4. 25-micr on-wide Line 
(Note thatgb indicates the location of a grain boundary.) 



narrow (bamboo) lines, but by the lifetime of the 
wide lines (one to two times the average grain size) 
for equivalent current densities. Thus, a qualifica- 
tion plan based on testing minimum-width lines 
exclusively is inadequate for advanced VLSI circuit 
technologies and may lead to overly optimistic reli- 
ability estimates. 

Length/ Width Scaling Model 
Parameters 

In principle, it should be possible to uniquely deter- 
mine the three model parameters 0, y, / for a given 
width, knowing t 5f) and cr from lifetime measure- 
ments on two different length lines. However, in 
practice, this is not the case because of the statisti- 
cal uncertainty in estimating t 50 and cr. Estimates 
of t 50 and a extrapolated from test data range over 



intervals bounded by the choice of confidence 
limit. This statistical uncertainty is unavoidable but 
can be reduced by using larger sample sizes. Never- 
theless, we can determine the model parameters 
within the constraints of these limits. 

For example, consider testing two groups of lines 
with the same line width but with different lengths, 
such that L 2 = 10L l (i.e., N 2 = JON f ). For each sample 
group, a range for a and t^ can be extracted from 
the resulting failure-time distributions. In Figure 4, 
the calculated y for the elemental failure distribu- 
tion is shown as a function of N) for the upper and 
lower confidence limits of cr. The band of parame- 
ter values lying between these two curves repre- 
sents the allowable combinations of y and N that 
will fit the estimates of cr. Similarly, two curves 
were calculated for the upper and lower confi- 
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dence limits on the ratio of the t^ values for the 
two lines. The parameter space denned by the inter- 
section of these two bands, as shown by the cross- 
hatched area in Figure 4, indicates the combination 
of model parameters that will fit the test data. 
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Figure 4 Allowed Values of Model Parameters 
~y and N Resulting from Statistical 
Uncertainty in t 50 and cr from 
Lifetime Measurements 

To determine appropriate model parameters for 
our metallization, we systematically studied the 
effect of interconnect length and width on life- 
time. 14 The test samples were processed through 
two levels of metallization; however, only lines 
in the first-level metal were tested. Each level of 
metai was an aluminum alloy containing 1 weight- 
percent copper ( Al: 1 %Cu), 7500 angstrom units (A) 
thick, capped with 400 A of titanium nitride (TiN). 

The interlevel dielectric was a planarized plasma- 
enhanced tetraethylorthosilicate (PE-TEOS) oxide 
process 7500 A thick, and the wafers were passi- 
vated with a layer of undoped oxide 7100 A thick. 
The average metal grain size was determined from 
transmission electron microscopy (TEM) analysis to 
be approximately 35 microns (>m). 

The effect of line length on lifetime is demon- 
strated in Figure 5. t 50 decreases by roughly a factor 
of 20 as the line length increases from 1.05 to 18.9 
millimeters (mm) for 1.25-/Ltrn-wide lines. The error 



bars on the data represent the 90 percent confi- 
dence limits. Also shown in Figure 5 is the increase 
in cr observed as the line length increases. The 
model parameters were determined from this test 
data; the ranges of these parameter values are given 
in Table 1. The solid lines in Figure 5 were calcu- 
lated using the model. 
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Figure 5 t^ and cr as Functions of Length 
for 1 .25 -micron-wide Lines 



Table 1 Model Parameters as a Function 
of Line Width 



Parameter 






1 Line Width 1 
1.25^m 4.25 /xm 


/yum 


2100-2700 


26-124 


8 In (hours) 


7.92-8.18 


6.00-7.15 


1 


0.975-1 .05 


0.975-1.05 



The test results shown in Figure 6 demonstrate 
the increase in lifetime with decreasing line width 
expected in the bamboo region. This data shows 
that t 50 decreases by roughly a factor of 100 as 
the line width decreases from 4.25 to 1.25 /im for 
1.05-mm-long lines. The increase in cr observed as 
the line width decreases is also shown in Figure 6. 

Since we do not yet have data on different 
lengths at line widths other than 1.25 /^m, all three 
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Figure 6 t ^ and a as Functions of Width 
for 1.05- millimeter-long Lines 

model parameters could not be independently 
determined at other widths. Since 6 is expected to 
decrease and / to increase as the length and number 
of nonbamboo segments decreases with 1 ine width, 
we chose to fix y. Parameter value ranges were 
extrapolated from the 4.25-M m " w ide line data and 
are included in Table 1. The lines in Figure 6 are 
calculated from the model using the parameters 
extracted from the 1.25- and 4.25-^m line widths 
and interpolating for intermediate line widths. The 
test results for intermediate line widths of 2.0 and 
2.75 fim are shown in Figure 6 to lie on the calcu- 
lated curve, as well. 

For line widths larger than 4.25 M m > tne lifetime 
does not continue to decrease. Lifetime measure- 
ments on 575-/xm-wide lines are comparable to the 
4.25-tun-wide lines. The lifetime gradually increases 
as the line width increases, for line widths that are 
greater than the grain size. 

Tungsten-filled Vias 

In CMOS technology development, the lateral 
dimensions of the vias have shrunk more quickly 
than the vertical thickness of the interlevel dielec- 
tric. It is difficult to reliably sputter-deposit alu- 
minum metallization to cover the steep, vertical 
sidewalls of small-diameter, high-aspect-ratio vias. 
As a result, techniques for filling high-aspect-ratio 



vias with vertical sidewalls have been developed in 
recent years using tungsten from low-pressure 
chemical vapor deposition (LPCVD). 15 

The electromigration lifetime of via chains for 
nonfilled vias decreases with reduced hole diame- 
ter. 16 This is a direct consequence of poor metal 
side wall coverage for small-diameter, high-aspect- 
ratio vias. 1617 

Although filling the via with LPCVD tungsten 
improves sidewall coverage, the presence of the 
tungsten "plug" between the two aluminum lines 
introduces a discontinuity in the electromigration- 
induced mass transport of aluminum that may lead 
to failure. Therefore, it is very important to charac- 
terize the electromigration reliability of tungsten- 
filled vias. 

Testing was performed on metal-1 to metal-2 
tungsten-filled via chains (Ml and M2) fabricated 
using standard CMOS-4 processing through all three 
metal levels. Both levels of metal were 7500 A thick 
Al:l%Cu with a 400 A TiN cap. The interlevel dielec- 
tric was a planarized PE-TEOS oxide 7500 A thick. 
The total dielectric thickness over the M2 was 
approximately 2.5 /xm. The via diameter was 
0.75 jLtm, and there were eight vias in each chain. 
The widths of the Ml and M2 interconnections 
were identical, nominally 1.88 /xm, as were the 
lengths of the metal links, which measured approxi- 
mately 170 fxm. The links were long enough to elim- 
inate the chance of lifetime enhancement due to 
the effect of back pressure in short lines. 1318 

Electromigration lifetime measurements were 
carried out at temperatures of 200 and 220 degrees 
Celsius (°C) and currents of 6 and 8 milliamperes 
(mA). A 10 percent change in resistance was the fail- 
ure criterion. The lifetime is found to be approxi- 
mately proportional to the inverse of the square of 
the current density: This dependence is in agree- 
ment with other test results on tungsten-filled vias 
and with observations for single-level 1 ines. 18 

The extrapolated activation energy is nearly 
1.2 eV. This value is considerably higher than 
that expected for grain boundary transport, and 
approaches the value for lattice diffusion in single- 
crystal aluminum (about 1.4 eV). However, since the 
line width is much smaller than the average grain 
size for these thin films, the Ml and M2 inter- 
connects will have bamboo microstructures. There- 
fore, we would expect very little mass transport via 
grain boundaries. Instead, lattice diffusion, or possi- 
bly diffusion along the aluminum-oxide interface, 
would be the primary transport mechanism. 
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We also considered the possibility of local heat- 
ing at the via because of the higher resistivity of the 
via -fill material and the reduced cross-sectional 
area for current flow. Three-dimensional finite ele- 
ment models of the via structure were used in 
simulations to calculate the current density distri- 
bution and temperature profiles resulting from 
Joule heating, as shown in Figure 7. These simula- 
tions indicate that in the entire structure, the tem- 
perature increase above ambient is less than 3 °C 
under the worst-case current stress of 8 mA. Due 
to the good heat conductivity of aluminum, the 
temperature gradients are small, and the effect on 
electromigration lifetime is negligible. 

Figure 8 shows a micrograph of a scanning elec- 
tron microscope (SEM) image of a stressed via that 
has been cross-sectioned using a focused ion beam 
(FIB). The transport of aluminum away from the alu- 
minum-tungsten interface in the direction of elec- 
tron current flow is apparent. The F1B/SEM analysis 
of stressed via chains shows voiding at both inter- 
faces, with no indication of preferential failure at 
either the top or the bottom interface. 

Clearly, via failure is caused by void formation 
resulting from the migration of aluminum from 
the tungsten plug. In the future, the reliability of 
submicron via interconnects may be improved by 
developing processes to replace the tungsten with 
aluminum to fill the via. 

Assuming that the failure distribution for a single 
via is lognormal, the failure distribution for the via 
chain is also MLN. In this case, since there is no 
apparent lifetime dependence on the direction of 
current flow, i.e., from Ml to M2 or vice versa, the 
value ot N is the number of vias in the chain, i.e., 
N = S. For a single via, 0and y can then be deter- 
mined from via chain test results using the MLN dis- 
tribution as given in equation (6). 

The chip scaling model can be easily modified to 
include vias as separate failure elements. The result- 
ing probability for chip failure is given by 

H(t) = \-Tl(\ -F n (t)) L » /l n\\(\ -Fjt)) N »> (8) 

n m m 

where N m is the number of vias of a particular type, 
and F m (t) is the failure distribution for a single via. 

Electromigration Reliability 
Qualification 

The Alpha 21064 microprocessor is the largest and 
fastest chip built using the CMOS-4 technology. 
Careful analysis of this chip determined the number 
of vias and the total length of lines (as a function of 



line width) that carry currents near the electro- 
migration design rule limit. Interconnects with 
currents much less than the design rule limit have 
negligible impact on the electromigration reliabil- 
ity. With this information, we can then calculate the 
appropriate performance requirements for the test 
structures to meet our chip reliability goal. 

Our reliability goal is to assure a less than 1 per- 
cent probability of chip failure in 10 years under 
worst-case operating conditions. The stress accel- 
eration model described in equation (1) is used to 
extrapolate from accelerated stress conditions to 
worst-case operating conditions. Scaling of test 
data to the chip level is accomplished by use of 
the multilognormal approximation to the failure 
distribution. 

The entire circuit can be considered to consist of 
several component groups, where each group 
includes a particular type of interconnect element, 
e.g., a certain via type or line width. The lognormal 
unit failure distribution for the ith group F.(t) is 
characterized by y f and 0 r Rewriting equation (8) 
in a slightly more compact form, the cumulative 
probability of chip failure 

H(t) =l-ri5,(y (9) 

where 

S/O^a-F.CO)"* (10) 

and N i is the number of components from the ith 
group on the chip. For vias, N i is the number of 
vias; for lines, N. is the total length divided by the 
characteristic unit length, which depends on line 
width. Since the reliability goal is tovH(t) to be less 
than 0.01 for 10 years under worst-case operating 
conditions, then for each group, Fft) is much less 
than 0.01. 

As mentioned previously, uncertainty in the 
values for y, 6 7 and / stems from the uncertainty 
inherent in estimating t $0 and Jtoa given level of 
confidence from experimental data. It is illuminat- 
ing to examine the impact of this inherent uncer- 
tainty has on the performance requirements for the 
test structures to meet the chip reliability goal. 

Consider the simple case of only one component 
group, namely a single line width. The required test 
structure t^ tiiSt relative to the time until 1 percent 
cumulative failure for the chip occurs t 0j chif) is plot- 
ted in Figure 9. This ratio is graphed as a function 
of the ratio of the number of units in the test struc- 
ture to that of the chip, N te ^/N chi , for a given combi- 
nation of N and y. For this illustration, we use 
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CURRENT DENSITY (mA/jxmS) 

I ! 0.235- 5.244 15.264 -20.274 

iSTH] 5.244 -10.254 BBM 20.274 - 25.283 
§ 10.254 -15.264 



(a) 




TEMPERATURE INCREASE (°C) 

WM 1.836- 1.915 2.072 -2.150 

1993 I I 2 150-2.229 
MM 1993 -2.072 



Cb) 

figure 7 Finite Element Simulations of Current Density and Temperature Distributions 
in Tungsten -filled Vias 
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Figure 8 Secondary Electron Micrograph 
of Electromigration Damage 
at a Tungsten-filled Via 

points from the example shown in Figure 4 for com- 
parison. 

Points A and C both lie on the edge of the allowed 
parameter space in Figure 4, but y and N are both 
larger for C. Figure 9 shows that the requirements 
on t 50 are greater for C than for A. However, 
the relative effect of increasing /Vor 7 separately is 
not clear. Increasing N alone, that is, comparing 
point Bl to A, actually results in a decrease in the 
test requirements, as shown in Figure 9- This effect 
weakens as the N tesl /N m , ratio approaches unity: 
Thus, the test line should be as long as practical to 
mitigate the impact of uncertainty in /. The increase 
in h§ tesftohcbip* therefore, is a result of increasing 7. 
The sensitivity to increasing 7 alone can be seen in 
Figure 9, by comparing point B2 to A. 

To assure that the chip reliability goals are met, 
the most conservative combination of parameter 
estimates should be used. These values are the 
most stringent in setting and meeting the test per- 
formance requirements. The most rigorous test per- 
formance requirements are set by using the largest 
possible value for 7. In the example presented in 
Figure 9, this value of 7 implies the largest value of 
S, i.e., the smallest value for /, within the ranges 
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10- 5 TO" 4 ID" 3 10" 2 10~ 1 1 

Ntest/ A/chip 

KEY: 

— B— -7 = 1.5, A/= 10 

-y = 1 .5, A/ = 200 

7-2.09, W= 10 
-0- 7 = 2.09, N = 200 

Figure 9 Requirements on t so for the Test 
Structure Relative to t 0l for the 
Chip as a Function of the Ratio 
of the Number of Elements in the 
Test Structure to That of the Chip 

set by the confidence limits on crand t 50 . The most 
stringent criterion for meeting the test perfor- 
mance requirements is to use the lower limit of the 
confidence interval for t 50 from the test data, which 
corresponds to the minimum 9 value consistent 
with the values of 7 and N. 

Electromigration lifetimes, and thus the scaling 
model parameters, are a sensitive function of the 
microstructure of the conductor film. 19 Therefore, 
since the effect of normal microstructural varia- 
tions cannot be unambiguously determined a pri- 
ori, a number of lots must be tested to assess the 
effects of lot-to-lot variations. 

Each lot undergoes a statistical test of the 
hypothesis 

H(\0 years) ^0.01 (11) 

where H(\0 yearsj denotes the cumulative proba- 
bility of chip failure in the first 10 years of continu- 
ous operation under worst-case conditions. To pass 
the test, the probability of accepting the hypothesis 
when H(\0 years) is greater than 0.01 must be less 
than 0.1. This criterion gives at least 90 percent 
confidence that a test group passing the statistical 
test, i.e., for which the hypothesis is accepted, 
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■ Does indeed come from a lot that meets the 
requirement of not failing more than 1 percent of 
the time in 10 years 

■ Is not a statistical fluke 

The statistical analysis procedures used to imple- 
ment this model in the electromigration qualifi- 
cation testing are coded in a software tool. The 
software extracts the most conservative MLN 
model parameters from the failure-time distribu- 
tion measured for every type of structure tested for 
each lot. This tool can be used to perform the statis- 
tical test just described to verify that the reliability 
goal has been met. 

Summary 

Interconnect electromigration reliability becomes 
increasingly important with each step in the evolu- 
tion of CMOS technology. Therefore, it is necessary 
to rigorously characterize the various components 
of the circuit metall ization and to develop depend- 
able models to relate test device data to long-term 
chip reliability. 

We have presented a scaling model for relating 
the results of accelerated electromigration life tests 
on test structures to the overall chip reliability. This 
model was used as the basis for formulating qualifi- 
cation requirements for electromigration reliability 
assurance of the CMOS-4 process technology and 
the Alpha 21064 microprocessor. 
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